Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 16, 2025

  • Create MediaTypeMap polyfill in LegacySupport directory with System.Net.Mime namespace
  • Add static LoadFromAsync methods to DataContent class:
    • Stream overload that accepts a Stream (infers name and mime type from FileStream's path if applicable)
    • String path overload that infers name and mime type from the file path
    • Pre-initialize MemoryStream capacity when source stream is seekable for performance
  • Add instance SaveToAsync method to DataContent class:
    • Handles directory paths by inferring filename from Name property or using random name
    • Always infers extension from media type if path has none
    • Returns the actual path where data was saved
    • Fix path traversal security issue by using Path.GetFileName(Name)
  • Remove [Experimental] attribute and add methods to JSON API baseline as stable
  • Use GetBuffer() instead of ToArray() for MemoryStream efficiency
  • Add WriteAllBytesAsync(ReadOnlyMemory<byte>) polyfill to remove #ifdef from SaveToAsync
  • Change return types from Task<T> to ValueTask<T> for all new public async methods
  • Add C# 14 extension member polyfills for File.ReadAllBytesAsync and File.WriteAllBytesAsync in LegacySupport directory
  • Remove name parameter from LoadFromAsync(Stream) - Name can be assigned after the call
  • Add comprehensive unit tests for the new functionality
  • Fix net462 build: Add InjectFilePolyfillsOnLegacy to test project
  • Add test for SaveToAsync with empty string path
  • Simplify extension inference by removing unnecessary null check
  • Update MarkItDownMcpReader.cs to use DataContent.LoadFromAsync with optional media type parameter
  • Add Microsoft.Extensions.AI.Abstractions project reference to MarkItDown project for LoadFromAsync access
  • Run code review and security checks
  • Switch components to use MediaTypeMap for extension/MIME type mappings:
    • ImageGeneratorExtensions.cs - Replace _extensionToMimeType dictionary with MediaTypeMap.GetMediaType
    • OpenAIClientExtensions.cs - Replace ImageUriToMediaType with MediaTypeMap.GetMediaType
    • OpenAIImageGenerator.cs - Replace _mimeTypeToExtension dictionary with MediaTypeMap.GetExtension
    • IngestionDocumentReader.cs - Remove custom lookup, only use MediaTypeMap.GetMediaType
Original prompt

This section details on the original issue you should resolve

<issue_title>We should have helper methods / extensions for DataContent</issue_title>
<issue_description>At a minimum we should have Load and Save.

Load should read from a path and preserve the file name loaded and try to decode a mime-type from that.

Save should save to a path, with an optional filename, using Name if a directory is provided in the path.

Perhaps we could consider a filename without extension to be provided and we infer the extension from the mime-type and return the saved path.

We should also have Stream based helpers.</issue_description>

<agent_instructions>Add static LoadFromAsync and instance SaveToAsync methods to DataContent. There should be a string path overload for loading that infers the name and mime type from the path (it should add a polyfill to common for https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Mail/src/System/Net/Mime/MediaTypeMap.cs and use that to infer the mime type). There should also be a stream based one that should accept a Stream. If it's actually a FileStream it should default to inferring name and mime type from the filestream's path. It should fall back to octet stream if unknown. SaveAsync should accept a required path to save to; it should accept an optional bool parameter defaulting to true for inferExtension that will add an extension to the provided path if it doesn't have one, based on the mime type and MediaTypemap. It should return the path to which the data was saved. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Add helper methods for DataContent with Load and Save Add LoadFromAsync and SaveToAsync helper methods to DataContent Dec 16, 2025
Copilot AI requested a review from stephentoub December 16, 2025 16:20
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot

All review comments addressed in 840f61c. Changes include: MediaTypeMap moved to LegacySupport with full mappings, methods are now stable (not experimental), DefaultMediaType moved to DataContent as private const, using GetBuffer() for MemoryStream, proper null handling for Name, SaveToAsync handles directory paths and always infers extensions, and NET9_0_OR_GREATER conditional for ReadOnlyMemory overload.

Copilot AI requested a review from stephentoub December 16, 2025 17:15
@stephentoub stephentoub marked this pull request as ready for review December 16, 2025 17:16
@stephentoub stephentoub requested review from a team as code owners December 16, 2025 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds helper methods to DataContent for loading from and saving to file system paths and streams, with automatic media type (MIME type) inference based on file extensions.

Key Changes

  • Adds static LoadFromAsync methods (path and stream overloads) that infer media types and names from file extensions
  • Adds instance SaveToAsync method that handles directory paths and extension inference
  • Implements MediaTypeMap polyfill for .NET 9 and earlier to enable file extension-to-MIME type mapping
  • Marks the new APIs as stable (non-experimental) in the JSON baseline

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataContent.cs Implements the new LoadFromAsync and SaveToAsync methods with media type inference logic
src/LegacySupport/MediaTypeMap/MediaTypeMap.cs Adds polyfill implementation of media type mapping for pre-.NET 10 frameworks
src/LegacySupport/MediaTypeMap/README.md Documents the MediaTypeMap polyfill purpose and origin
eng/MSBuild/LegacySupport.props Configures MSBuild to inject MediaTypeMap polyfill for legacy target frameworks
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.csproj Enables MediaTypeMap injection for the abstractions project
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.json Adds the new methods to the stable API baseline
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/DataContentTests.cs Adds comprehensive unit tests covering all new functionality and edge cases

@stephentoub stephentoub added the area-ai Microsoft.Extensions.AI libraries label Jan 5, 2026
Copilot AI and others added 15 commits January 5, 2026 15:41
@stephentoub stephentoub force-pushed the copilot/add-helper-methods-data-content branch from ced6113 to bd2f87a Compare January 6, 2026 01:59
@stephentoub stephentoub enabled auto-merge (squash) January 6, 2026 02:00
@stephentoub
Copy link
Member

@copilot, there are build failures, e.g.

Build FAILED.

D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(47,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net462]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(61,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net462]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(47,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(61,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(47,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(61,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(47,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=netstandard2.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(61,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=netstandard2.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(47,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net9.0]
D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\MarkItDownMcpReader.cs(61,53): error CS0117: 'DataContent' does not contain a definition for 'LoadFromAsync' [D:\a\_work\1\s\src\Libraries\Microsoft.Extensions.DataIngestion.MarkItDown\Microsoft.Extensions.DataIngestion.MarkItDown.csproj::TargetFramework=net9.0]
    0 Warning(s)
    10 Error(s)

auto-merge was automatically disabled January 6, 2026 16:35

Head branch was pushed to by a user without write access

Copy link
Contributor Author

Copilot AI commented Jan 6, 2026

@copilot, there are build failures, e.g.

Build FAILED.

...

Fixed in faa7318. Added project reference to Microsoft.Extensions.AI.Abstractions in the Microsoft.Extensions.DataIngestion.MarkItDown.csproj to enable access to DataContent.LoadFromAsync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should have helper methods / extensions for DataContent

3 participants