refactor(pt_expt): use model API for inference, consistent file naming#5354
refactor(pt_expt): use model API for inference, consistent file naming#5354iProzd merged 6 commits intodeepmodeling:masterfrom
Conversation
…te metadata files Python DeepEval now deserializes model.json and delegates all API calls (get_rcut, get_sel, get_dim_fparam, etc.) to the dpmodel instance instead of reading a flat metadata dict. This is consistent with how other backends (.dp/.yaml, .pth) work. File layout in .pt2/.pte archives is renamed for consistency: - metadata.json: C++ runtime metadata (was model_def_script.json) - model_def_script.json: training config (was model_params.json, now matches .pth convention) - output_keys.json removed (merged into metadata.json)
…/.pt2 Add two tests exercising the .pte/.pt2 → change_bias → .pte/.pt2 round-trip, asserting that training config (model_def_script) embedded by freeze is preserved in the output file.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e87eaadac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConsolidates Pt-expt serialization: training config moved to Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
source/tests/pt_expt/test_change_bias.py (1)
292-296: Consider adding a comment explaining the device manipulation.The
torch.set_default_device(None)/torch.set_default_device("cuda:9999999")pattern appears to work around AOTInductor compilation issues when a non-existent CUDA device is set as default. A brief comment explaining this would improve maintainability.# Freeze to .pt2 pt2_path = os.path.join(self.tmpdir, "frozen_mds.pt2") + # Clear default device to avoid poisoning AOTInductor compilation + # (tests/pt/__init__.py sets it to "cuda:9999999" for CPU fallback). torch.set_default_device(None)Also applies to: 306-310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/test_change_bias.py` around lines 292 - 296, Add a brief inline comment near the torch.set_default_device(None) / torch.set_default_device("cuda:9999999") calls in the test (around the freeze(model=self.model_path, output=pt2_path) block and the similar block at 306-310) explaining that this pattern temporarily clears the default device to avoid AOTInductor compilation issues and then restores a non-existent CUDA device as a deliberate sentinel; reference the freeze function and the model_path/pt2_path variables so future readers know this is a deliberate workaround for the compiler rather than a mistake.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 696-698: The show() function currently assumes model_params
contains keys and will KeyError when get_model_def_script() returns {}—change
the direct dictionary indexing in show() (references to
model_params["type_map"], model_params["descriptor"], and
model_params["fitting_net"]) to safe accessors like model_params.get("type_map",
{}), model_params.get("descriptor", {}), and model_params.get("fitting_net", {})
or guard with an if "key" in model_params check before use; ensure nested
attribute reads also use .get(..., {}) or default values similar to the existing
model_params.get("info", {}) usage so single-task models exported without
model_params don't raise KeyError.
---
Nitpick comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 292-296: Add a brief inline comment near the
torch.set_default_device(None) / torch.set_default_device("cuda:9999999") calls
in the test (around the freeze(model=self.model_path, output=pt2_path) block and
the similar block at 306-310) explaining that this pattern temporarily clears
the default device to avoid AOTInductor compilation issues and then restores a
non-existent CUDA device as a deliberate sentinel; reference the freeze function
and the model_path/pt2_path variables so future readers know this is a
deliberate workaround for the compiler rather than a mistake.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7a18ce25-e60a-481c-a664-50143444202d
📒 Files selected for processing (8)
.gitignoredeepmd/pt_expt/entrypoints/compress.pydeepmd/pt_expt/entrypoints/main.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/api_cc/src/DeepPotPTExpt.ccsource/tests/pt_expt/infer/test_deep_eval.pysource/tests/pt_expt/test_change_bias.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5354 +/- ##
==========================================
- Coverage 82.32% 82.32% -0.01%
==========================================
Files 809 810 +1
Lines 83161 83223 +62
Branches 4067 4067
==========================================
+ Hits 68462 68511 +49
- Misses 13484 13494 +10
- Partials 1215 1218 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ize_from_file serialize_from_file now returns training config under 'model_def_script' key (renamed from 'model_params' in the previous commit). Update _load_model_params in finetune.py to match.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepmd/pt_expt/utils/finetune.py (1)
83-86:⚠️ Potential issue | 🟡 MinorUpdate stale remediation text to the new field name.
Line 85 still says
model_paramsis embedded, but this path now depends onmodel_def_script. This makes failure guidance confusing.Suggested fix
- "re-freeze it with the latest code so that model_params is embedded." + "re-freeze it with the latest code so that model_def_script is embedded."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/finetune.py` around lines 83 - 86, The error message shown when rejecting --use-pretrain-script is stale: it mentions that the pretrained model "contains full model params" and that "model_params is embedded"; update the message text in the finetune.py check that raises on --use-pretrain-script so it references the new field name model_def_script (e.g., say the finetune path depends on model_def_script or that model_def_script must be embedded), and adjust wording to clearly instruct the user to re-freeze with the latest code so model_def_script is embedded; keep the rest of the error logic intact and only change the textual guidance string associated with the --use-pretrain-script failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@deepmd/pt_expt/utils/finetune.py`:
- Around line 83-86: The error message shown when rejecting
--use-pretrain-script is stale: it mentions that the pretrained model "contains
full model params" and that "model_params is embedded"; update the message text
in the finetune.py check that raises on --use-pretrain-script so it references
the new field name model_def_script (e.g., say the finetune path depends on
model_def_script or that model_def_script must be embedded), and adjust wording
to clearly instruct the user to re-freeze with the latest code so
model_def_script is embedded; keep the rest of the error logic intact and only
change the textual guidance string associated with the --use-pretrain-script
failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4d3cb8e9-70f4-4c61-a656-60138eec1d2f
📒 Files selected for processing (1)
deepmd/pt_expt/utils/finetune.py
model_def_script.json is always present in .pte/.pt2 archives, so the
fallback to {"type_map": ...} is unreachable.
Read training config from data["model_def_script"] instead of a separate kwarg, matching the convention of pt/jax/tf/pd backends. This fixes dp convert-backend for pt_expt, which passes the universal dict (already containing model_def_script) with only 2 positional args.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 234-241: serialize_from_file currently always treats
extra_files["model_def_script.json"] as the training config which mis-parses
legacy archives where the real config is in "model_params.json" and
"model_def_script.json" holds runtime metadata; update serialize_from_file to
detect legacy layout by checking for presence of "model_params.json" (and/or
inspecting model_def_script contents for metadata markers), and if present load
the config from "model_params.json" (convert with _json_to_numpy as needed)
instead of model_def_script.json; if legacy archives are intentionally
unsupported, fail fast with a clear error instead of silently writing metadata
back as the training script—apply the same fix in the other reader block around
lines referencing extra_files at the second occurrence (the block covering the
other 258-266 area).
- Around line 168-177: Update the _collect_metadata() docstring to reflect the
new .pte archive layout: change the statement that .pte stored metadata in
"model_def_script.json" to indicate metadata is now stored in "metadata.json"
(and that "model_def_script.json" is reserved for the training config by
_deserialize_to_file_pte()). Ensure the docstring mentions that C++ reads flat
JSON fields from metadata.json for inference and that fitting_output_defs
remains included for reconstructing ModelOutputDef without loading the full
model; reference the _collect_metadata and _deserialize_to_file_pte symbols when
making the clarification.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4ca937c6-5d3a-421c-b54e-aeb0fa9ab421
📒 Files selected for processing (3)
deepmd/pt_expt/entrypoints/main.pydeepmd/pt_expt/utils/serialization.pysource/tests/pt_expt/infer/test_deep_eval.py
✅ Files skipped from review due to trivial changes (1)
- source/tests/pt_expt/infer/test_deep_eval.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt_expt/entrypoints/main.py
- show.py: use safe .get() for type_map/descriptor/fitting_net to avoid
KeyError when get_model_def_script() returns {} (model frozen without
training config); fall back to model.get_type_map() for type-map
- finetune.py: fix stale error message referencing "model_params" → "model_def_script"
- serialization.py: fix _collect_metadata() docstring (metadata.json used
in both .pt2 and .pte, not just .pt2)
- test_change_bias.py: add comments explaining set_default_device(None) workaround
Summary
Problem
Two inconsistencies in
.pt2/.ptefiles:Python reads a flat metadata dict instead of using the model API. Other backends (
.dp/.yaml,.pth) deserialize the model and query it directly forget_rcut(),get_sel(),model_output_type(), etc. The.pt2/.ptebackend was reading these from a metadata dict stored at export time, duplicating model logic.Inconsistent file naming.
model_def_script.jsonstored C++ runtime metadata in.pt2/.pte, but training config in.pth. Training config lived separately inmodel_params.json.output_keys.jsonwas a standalone file that logically belongs with metadata.Solution
Python inference:
DeepEvalnow deserializesmodel.jsoninto a dpmodel instance (self._dpmodel) at load time and delegates all API calls to it._reconstruct_model_output_def()is removed.File layout renamed so that each filename means the same thing across
.pthand.pt2/.pte:model_def_script.json.pth)metadata.jsonmodel_params.jsonoutput_keys.jsonmetadata.json)model.jsonC++ inference (
DeepPotPTExpt.cc): Updated to readextra/metadata.jsoninstead ofextra/model_def_script.json, and readsoutput_keysfrom the metadata dict instead of a separateoutput_keys.json.Why
metadata.jsonstill exists: C++ inference cannot deserializemodel.jsonto call model API methods. The alternative — compiling methods likeget_rcut(),get_sel()as additional AOTInductor entry points — was benchmarked and rejected:get_type_map()returns strings.torch.exportonly supports tensor I/O — encoding strings as int tensors adds complexity for no benefit.rcut,sel,type_mapnever change after export. A flat JSON file is the simplest and fastest solution.Other changes
compressandchange_biasentrypoints now preserve training config through.pte/.pt2round-trips.gitignoreupdated to exclude.pte/.pt2model files_collect_metadata()dropsmodel_output_typeandsel_type(not used by C++; Python now gets them from the model)Test plan
source/tests/pt_expt/infer/test_deep_eval.py— 36/36 pass (.pte+.pt2)test_model_api_delegation,test_get_model_def_script_with_paramstest_get_model_def_script,test_pt2_has_metadata,test_dynamic_shapessource/tests/pt_expt/model/— 50/50 pass (frozen, compression, serialization)source/tests/pt_expt/test_change_bias.py— new tests for.pte/.pt2model_def_script preservation.pt2models regenerated with newmetadata.json)Summary by CodeRabbit
Bug Fixes
Refactor
Tests
Chores