Skip to content

feat: make conda-pypi mappings configurable#6333

Open
tdejager wants to merge 36 commits into
prefix-dev:mainfrom
tdejager:feat-additive-conda-pypi-map
Open

feat: make conda-pypi mappings configurable#6333
tdejager wants to merge 36 commits into
prefix-dev:mainfrom
tdejager:feat-additive-conda-pypi-map

Conversation

@tdejager

@tdejager tdejager commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Description

This PR makes the conda↔PyPI name mapping configurable in both directions:

  • workspace conda-pypi-map now supports per-channel overlay/replacement mappings, inline entries, URL/file locations, hard disables, and an explicit same-name heuristic
  • pixi-build-python now supports pypi-conda-map overrides before the PyPI→conda mapping service

The forward and reverse mappings intentionally have different shapes:

  • conda-pypi-map is per channel and maps conda package names → one or more PyPI names because it decides which installed conda packages satisfy PyPI requirements/PURLs.
  • pypi-conda-map is a flat map of PyPI package name → one conda package name or false because the build backend converts each Python requirement into at most one conda recipe dependency.

Example:

[workspace.conda-pypi-map]
# Additive overlay: project entries win, misses use Pixi's default mapping data.
conda-forge = { mapping = { pytorch = "torch", not-on-pypi = false } }

# Replace Pixi's default mapping data for this channel. Same-name guessing is separate.
my-company = { location = "https://internal.example.com/map.json", mapping-mode = "replace" }

# Hard-disable PyPI name derivation for this channel.
internal = false

[package.build.config]            # pixi-build-python
ignore-pypi-mapping = false
pypi-conda-map = { torch = "pytorch", my-internal-pkg = false }

Common configurations:

# Fix a few names, otherwise let Pixi do its best.
[workspace.conda-pypi-map]
conda-forge = { mapping = { pytorch = "torch", not-on-pypi = false } }
# Avoid default mapping lookups while keeping the conda-forge same-name heuristic.
# This is the explicit replacement for the deprecated `conda-pypi-map = {}`.
[workspace]
conda-pypi-map = { conda-forge = { mapping-mode = "replace" } }
# Treat a mapping file as the source of truth: no default mapping data and no same-name guesses.
[workspace.conda-pypi-map]
conda-forge = {
  location = "mapping.json",
  mapping-mode = "replace",
  same-name-heuristic = false,
}
# Enable the same-name heuristic for a non-conda-forge/private channel.
[workspace.conda-pypi-map]
my-internal = { mapping-mode = "replace", same-name-heuristic = true }

Concretely this PR:

  1. Adds table entries for conda-pypi-map channels with:

    • location
    • inline mapping
    • mapping-mode = "overlay" | "replace"
    • same-name-heuristic = true | false
  2. Keeps bare string entries as shorthand for an additive overlay location.

  3. Supports multiple PyPI names for one conda package (airflow = ["airflow", "apache-airflow"]).

  4. Makes false consistent at its scope:

    • conda-pypi-map = false disables all PyPI name derivation globally
    • <channel> = false disables all PyPI name derivation for that channel
    • mapping.package = false means that package is explicitly not on PyPI
  5. Makes the same-name heuristic explicit:

    • default enabled for conda-forge
    • default disabled for other channels
    • can be enabled for any configured channel
  6. Preserves the legacy behavior of conda-pypi-map = {} as a soft-deprecated spelling for “avoid default mapping lookups but keep conda-forge same-name guessing”. The explicit replacement is:

    [workspace]
    conda-pypi-map = { conda-forge = { mapping-mode = "replace" } }
  7. Caches URL mapping locations via standard HTTP cache semantics: remote fetches go through the existing http-cache middleware (CacheMode::Default), which honors the server's Cache-Control/ETag. A freshly fetched mapping is reused on later solves without network access, a stale one is revalidated cheaply, and a previously fetched copy is reused when a refresh fails (unless the server marked the response no-store/must-revalidate) so offline solves keep working. No bespoke per-entry TTL knob is needed.

  8. Adds pypi-conda-map to pixi-build-python, including target-specific per-key merge behavior.

  9. Improves diagnostics for offline/firewalled mapping failures and HTML/GitHub-blob mapping URLs.

  10. Cleans up pypi_mapping::resolvers naming (PrefixHash, PrefixCompressed, ProjectDefined, SameName).

Behavior changes

Existing manifest Before After To get the old behavior
conda-forge = "mapping.json" Exclusive: only packages in the file got PURLs. Additive overlay: project entries win, misses use Pixi's default mapping data and then same-name if enabled. conda-forge = { location = "mapping.json", mapping-mode = "replace", same-name-heuristic = false }
A mapping for channel A, while also using channel B Configuring any project mapping suppressed the conda-forge same-name heuristic globally. Channel B behaves as if no mapping were configured. Configure B explicitly, e.g. B = false to disable it.
conda-pypi-map = {} Avoided default mapping lookups while still allowing conda-forge same-name guessing. Same behavior, but deprecated with a warning. conda-pypi-map = { conda-forge = { mapping-mode = "replace" } }
conda-pypi-map = false Disabled project/default lookups but still allowed conda-forge same-name guessing. Hard-disable: no PyPI name derivation at all. Use { conda-forge = { mapping-mode = "replace" } } if you only wanted no network/default lookups.

How Has This Been Tested?

Automated/unit/integration coverage includes:

  • manifest parsing and snapshot diagnostics for the new mapping-mode, same-name-heuristic, false/null/list mapping values, and empty-map deprecation
  • project mapping conversion and duplicate-channel validation
  • overlay/replacement mapping behavior
  • explicit package false preventing fallback
  • channel/global hard-disable behavior
  • legacy conda-pypi-map = {} behavior
  • same-name heuristic defaulting to conda-forge and explicit enablement for another channel
  • remote mapping caching: a second, independent solve reuses the on-disk HTTP cache without any network access, and an uncached remote mapping whose fetch fails is a hard error
  • HTML/GitHub-blob parse hint
  • pypi-conda-map override/skip/marker/merge behavior

Commands run locally:

cargo test -p pypi_mapping
cargo test -p pixi_manifest conda_pypi_map
cargo test -p pixi_core conda_pypi_map
cargo test -p pixi --test integration_rust conda_pypi_map_tests

Schema files were regenerated from schema/model.py and validated with pixi run -e schema test-schema.

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.

Tools: Claude Code / pi coding agent

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added sufficient tests to cover my changes
  • I have verified that changes that impact the JSON schema have been made in schema/model.py

tdejager added 10 commits June 10, 2026 12:25
…eplace modes, inline mappings and cache-ttl

- conda-pypi-map now accepts false (global disable), and per-channel
  values can be a bare string, false, or a table with location,
  inline mapping entries, mode = "extend"|"replace" and cache-ttl.
- BREAKING: bare location strings now use the additive extend mode
  (overlay over the prefix.dev chain) instead of the exclusive
  replace mode. The old behavior is available via mode = "replace".
- conda-pypi-map = {} is soft-deprecated in favor of false and emits
  a deprecation warning.
- pixi_core wires the manifest entries into the per-channel mapping
  configuration; inline keys are lowercased to match normalized conda
  names, and cache-ttl is validated to require an http(s) location.
- Move all mapping/purl tests from solve_group_tests.rs into a new
  conda_pypi_map_tests.rs integration module.
- Split CondaPypiMapEntry::Map into CondaPypiMapSpec with a dedicated
  MappingLocationSpec { location, cache_ttl } so the TTL is structurally
  tied to the location source it applies to.
- Clarify in the Disabled doc comments that the offline conda-forge
  verbatim fallback still applies when lookups are disabled.
- Deduplicate the offline help text into a shared MAPPING_OFFLINE_HELP
  const used by both the prefix.dev and project-defined fetch errors, and
  mention pointing at a custom mapping location (with cache-ttl) as an
  escape hatch.
- Document why the TTL cache cannot reuse the http-cache middleware
  (header-driven freshness, client-global max_ttl, no stale-on-error).
- Docs: add a parselmouth raw-URL pinning recipe (and a note that blob
  URLs serve HTML).
- pixi_toml: add a custom_error(message, span) constructor and use it for
  the conda-pypi-map validation errors.
- pixi_core: extract the conda-pypi-map manifest conversion out of
  workspace/mod.rs into a workspace::conda_pypi_map module with named,
  unit-testable functions (incl. the channel-membership validation).
- pixi_core: classify mapping locations with rattler_lock::UrlOrPath
  instead of hand-rolled starts_with checks; file:// urls normalize to
  paths and non-http(s) remote schemes are rejected with a clear error.
- pypi_mapping: make the per-record fallback policy explicit with a
  Fallback enum (PrefixThenVerbatim | Verbatim | None) instead of a
  mutable suppression flag.
- pixi-build-python: dedupe the requirement version conversion into
  convert_requirement_version, shared by the user-map and service paths.
- test: pin that a mapping for one channel no longer suppresses the
  verbatim fallback for records from other, unmapped channels (online).
- TTL cache: treat a future mtime (clock skew) as age zero instead of
  making the cached copy invisible to the freshness check and the stale
  fallback; write cache files atomically via tempfile + persist; unit
  tests for the age computation.
- pypi-conda-map: an invalid conda name in an override now falls through
  to the mapping service instead of silently dropping the dependency.
- Split the offline help text: failures fetching a user-configured
  location now suggest checking the URL / adding cache-ttl instead of
  the firewall-framed prefix.dev advice; clearer HTTP status error.
- Warn when a mapping location uses plain http://, since a tampered
  mapping influences dependency resolution.
- Encode the manifest-mode to MappingMode conversion in a documented
  convert_mode function (a From impl is impossible: neither crate
  depends on the other, so the orphan rule forces it into pixi_core).
- Error wording: cache-ttl duration errors show example values;
  cache-ttl-without-location message no longer implies location must be
  a URL; {} deprecation help reworded; stale Disabled doc hedge fixed;
  duplicated doc comment removed.
- Docs: warning box now also covers the verbatim-fallback scope change
  for unmapped channels; cache-ttl docs state the no-cache hard-failure;
  inline-mapping example no longer reuses 'pytorch' as both channel and
  package name.
- New tests: mixed-case inline keys, cache-ttl on a local path rejected,
  file:// table-form location works (pins UrlOrPath normalization),
  Skip entries with markers, vacuous purls assertion fixed, unit tests
  for parse_mapping_location/convert_entry; re-documented what the
  fresh-cache TTL test actually pins (cache layout + no network).
- typos: reword 'mis-mapped' in the conda/PyPI concepts page.
- basedpyright: no implicit string concatenation in the new
  schema/model.py field descriptions (schema output unchanged).
nichmor and others added 10 commits June 12, 2026 12:44
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@tdejager tdejager marked this pull request as ready for review June 12, 2026 14:07
@tdejager tdejager added the test:extra_slow Run the extra slow tests label Jun 12, 2026
tdejager and others added 2 commits June 12, 2026 16:11
…-pypi-map

# Conflicts:
#	Cargo.lock
#	crates/pixi_manifest/Cargo.toml
@ruben-arts ruben-arts added the breaking Breaks something in the api or config label Jun 15, 2026
@ruben-arts

Copy link
Copy Markdown
Contributor

What is the reason for the addition of the cache-ttl on the remote mapping?

@ruben-arts

Copy link
Copy Markdown
Contributor

I've done some user testing and it all seems to work as designed. I've got one question about the conda-pypi-map = false. I was expecting no mapping at all, e.g. if there is a pypi and conda map it's just going to get both packages.

I believe there is currently no way accept making an empty map and assigning that as replace. Was this behavior already existing?

@nichmor

nichmor commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What is the reason for the addition of the cache-ttl on the remote mapping?

We want to cache the requested mapping and don't fetch it multiple times during the solve - for this reason, we thought that it would be great to allow users to choose how "fresh" the mapping should be for them, so we added this conf option

@tdejager

Copy link
Copy Markdown
Contributor Author

What is the reason for the addition of the cache-ttl on the remote mapping?

We want to cache the requested mapping and don't fetch it multiple times during the solve - for this reason, we thought that it would be great to allow users to choose how "fresh" the mapping should be for them, so we added this conf option

@ruben-arts for reference the compressed mapping for conda-forge on GitHub which you might use is about 1MB :)

Rename the additive channel mode to `mapping-mode = "overlay"` and add a per-channel `same-name-heuristic` switch. The heuristic now defaults to enabled for conda-forge and disabled elsewhere, but can be explicitly enabled for any channel.

Treat `conda-pypi-map = false` and `<channel> = false` as hard disables, while preserving the legacy empty-map behavior as a deprecated no-default-lookup conda-forge same-name configuration. Allow mapping-mode-only entries to express empty replacement mappings explicitly.

Update schema, docs, and integration coverage for the revised hierarchy, and clean up pypi_mapping resolver naming.
@tdejager tdejager changed the title feat: make conda-pypi-map additive and add pypi-conda-map build overrides feat: make conda-pypi mappings configurable Jun 16, 2026
tdejager and others added 2 commits June 16, 2026 15:02
Remote mapping fetches already pass through the http-cache middleware
(CacheMode::Default), and the real mapping hosts (prefix.dev, parselmouth
on raw.githubusercontent.com) send Cache-Control + ETag. Rely on that
entirely instead of a bespoke per-entry cache-ttl plus a hand-rolled
mtime/stale-on-error file cache:

- Remove the cache-ttl manifest option, its parsing/validation and snapshots.
- Collapse the now single-field MappingLocationSpec to a plain String.
- Delete the custom project-defined file cache; fetch straight through the
  cache-aware client. The middleware handles freshness, ETag revalidation,
  and use-stale-on-error (conditional_fetch serves the cached copy when a
  refresh fails, unless the response is no-store/must-revalidate).
- Drop now-unused deps (humantime, filetime, rattler_digest dev-dep).

Verified that a second, independent client reuses the on-disk HTTP cache
without any network access (test_remote_mapping_reused_from_cache_offline).
@tdejager tdejager force-pushed the feat-additive-conda-pypi-map branch from 0db381f to 665ed56 Compare June 17, 2026 07:54
- test_mapping_location asserted same-name=true for a non-conda-forge
  channel (https://prefix.dev/test-channel) via extend(); the heuristic
  correctly defaults to false there, so build the expected value explicitly.
- Remove the now-unused tempfile dependency from pypi_mapping (flagged by
  cargo-shear after the custom stale-cache file writer was deleted).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tdejager tdejager force-pushed the feat-additive-conda-pypi-map branch from 665ed56 to d539cae Compare June 17, 2026 08:16
@tdejager tdejager requested a review from baszalmstra June 17, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaks something in the api or config test:extra_slow Run the extra slow tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants