fix intermittent 404 (NoSuchKey) in Dir.download_sync for remote directories#1129
Open
samhita-alla wants to merge 2 commits into
Open
fix intermittent 404 (NoSuchKey) in Dir.download_sync for remote directories#1129samhita-alla wants to merge 2 commits into
Dir.download_sync for remote directories#1129samhita-alla wants to merge 2 commits into
Conversation
…ctories Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
kumare3
reviewed
May 29, 2026
| return local_dest | ||
|
|
||
| src_prefix = fs._strip_protocol(self.path).rstrip("/") | ||
| for remote_file in files: |
Contributor
There was a problem hiding this comment.
are you not just re-implementing fsspec in the sdk?
Contributor
Author
There was a problem hiding this comment.
recursive get in fsspec does:
rpaths = await self._expand_path(rpath, recursive=True) # may include rpath itself
rpaths = [p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))]
await asyncio.gather(*[self._get_file(p, ...) for p in rpaths])
fsspec has to handle both “rpath is a single file” and “rpath is a directory” so it inspects the root via _isdir. for obstore that does a HEAD on the bare prefix (404, because the prefix isn’t an object) and then falls back to a listing and when that detection flakes, the prefix isn’t filtered out and fsspec proceeds to _cat_file leading to 404 NoSuchKey. this PR composes it differently than recursive get does internally minus the file-vs-directory inspection of the root path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Dir.download_sync()intermittently fails with a 404 when downloading a remote directory, even though the directory's objects exist:It "works on retry" though. Observed in sync (
def) tasks that download a directory produced upstream.Root cause
download_syncdelegated to fsspec's recursivefs.get(self.path, dest, recursive=True). Before copying, fsspec must decide whether the remote prefix is a directory:HEADthe bare prefix key - a 404, since only the child objects are real keys, not the prefix itself,This HEAD-then-list-then-cache detection is timing/state dependent. When it resolves to "directory", the children are listed and downloaded. When the listing/cache evidence isn't present at the moment
_isdiris consulted, fsspectreats the prefix as a file and issues a
GETon the directory key resulting in404 NoSuchKey. The outcome is nondeterministic across invocations, hence it fails first and succeeds on retry. The data is always present; only theclassification flakes.
Note the async
Dir.download()is unaffected: it routes throughstorage.get()→ObstoreParallelReader, which lists the prefix and downloads each concrete object. The sync path was the only one relying on fsspec's recursive-get directory detection.Fix
download_syncnow mirrors what the async reader does, using sync fsspec: enumerate concrete object keys withfs.find(self.path)and download each withfs.get_file(). Only real keys are ever fetched, so there is no file-vs-directoryinference to flake. Falls back to the previous recursive
getwhenfindreturns nothing, preserving prior behavior/errors for empty/edge cases.