Docs: document filename convention and pipeline constraints for custom ingestion#2098
Open
RolandKrummenacher wants to merge 6 commits intodevfrom
Open
Docs: document filename convention and pipeline constraints for custom ingestion#2098RolandKrummenacher wants to merge 6 commits intodevfrom
RolandKrummenacher wants to merge 6 commits intodevfrom
Conversation
… ingestion (#2096) Closes #2096. Expands the "Ingest from other data sources" section with the details integrators need to avoid silent data loss: the ``__`` filename convention used by the ingestion pipeline, full-month replacement requirement, retry cost of empty parquet shards, corrected table versions (v1_2), and the non-empty ``manifest.json`` requirement driven by ``ignoreEmptyBlobs`` on the event trigger. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er month Reword the IMPORTANT callout so it no longer contradicts the preceding guidance about using ``dd``/``dd/hh`` subfolders for nonoverlapping deltas. The pre-ingest cleanup operates on the folder path, so each delta folder is independently replaced — the callout now reflects that. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The event trigger fires on manifest.json creation, after which the pipeline waits 60 seconds and enumerates the folder. Parquet files that arrive after that enumeration are skipped by the current run, so the manifest must be uploaded last. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Cross-references for reviewer context:
Closing #2096 (this PR's parent issue); leaving #2057 and #2046 for maintainers to decide whether an additional code-side fix is wanted. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2096. Expands the Ingest from other data sources section of the Hub deploy tutorial with details required for the Data Explorer ingestion pipeline to work correctly. The existing text led integrators into silent data loss when uploading custom FOCUS datasets (e.g. from GCP or AWS).
Changes to
docs-mslearn/toolkit/hubs/deploy.md:<ingestionId>__<originalFileName>.parquetfilename convention and explain how the pipeline derivesingestionIdby splitting on__(seeAnalytics/app.bicep:1806-1810).Analytics/app.bicep:1352.BadRequest_NoRecordsOrWrongFormatand the pipeline retries 3× at 120s intervals, perretry: 3, retryIntervalInSeconds: 120on the Ingest Data activity).{}because the storage event trigger setsignoreEmptyBlobs: true(fx/hub-eventTrigger.bicep:79) — a zero-byte file never triggers ingestion.v1_0to the currentv1_2schema (confirmed againstHubSetup_Latest.kql).No code changes.
Test plan
docs-mslearn) and confirm the IMPORTANT/TIP callouts render correctly under the numbered list item.🤖 Generated with Claude Code