Skip to content

refactor(pt_expt): use model API for inference, consistent file naming#5354

Merged
iProzd merged 6 commits intodeepmodeling:masterfrom
wanghan-iapcm:refact-pt2-pte
Apr 2, 2026
Merged

refactor(pt_expt): use model API for inference, consistent file naming#5354
iProzd merged 6 commits intodeepmodeling:masterfrom
wanghan-iapcm:refact-pt2-pte

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 30, 2026

Summary

Problem

Two inconsistencies in .pt2/.pte files:

  1. Python reads a flat metadata dict instead of using the model API. Other backends (.dp/.yaml, .pth) deserialize the model and query it directly for get_rcut(), get_sel(), model_output_type(), etc. The .pt2/.pte backend was reading these from a metadata dict stored at export time, duplicating model logic.

  2. Inconsistent file naming. model_def_script.json stored C++ runtime metadata in .pt2/.pte, but training config in .pth. Training config lived separately in model_params.json. output_keys.json was a standalone file that logically belongs with metadata.

Solution

Python inference: DeepEval now deserializes model.json into a dpmodel instance (self._dpmodel) at load time and delegates all API calls to it. _reconstruct_model_output_def() is removed.

File layout renamed so that each filename means the same thing across .pth and .pt2/.pte:

File Before After
model_def_script.json C++ metadata Training config (matches .pth)
metadata.json (did not exist) C++ metadata + output_keys
model_params.json Training config Removed
output_keys.json Output key list Removed (merged into metadata.json)
model.json Full serialized model No change

C++ inference (DeepPotPTExpt.cc): Updated to read extra/metadata.json instead of extra/model_def_script.json, and reads output_keys from the metadata dict instead of a separate output_keys.json.

Why metadata.json still exists: C++ inference cannot deserialize model.json to call model API methods. The alternative — compiling methods like get_rcut(), get_sel() as additional AOTInductor entry points — was benchmarked and rejected:

  • Compilation overhead: ~12s per trivial constant-returning function (C++ codegen + compile + link). With ~8 methods, that adds ~1.5 min to freeze time.
  • String outputs: get_type_map() returns strings. torch.export only supports tensor I/O — encoding strings as int tensors adds complexity for no benefit.
  • These are constants: rcut, sel, type_map never change after export. A flat JSON file is the simplest and fastest solution.

Other changes

  • compress and change_bias entrypoints now preserve training config through .pte/.pt2 round-trips
  • .gitignore updated to exclude .pte/.pt2 model files
  • _collect_metadata() drops model_output_type and sel_type (not used by C++; Python now gets them from the model)

Test plan

  • source/tests/pt_expt/infer/test_deep_eval.py — 36/36 pass (.pte + .pt2)
    • New: test_model_api_delegation, test_get_model_def_script_with_params
    • Updated: test_get_model_def_script, test_pt2_has_metadata, test_dynamic_shapes
  • source/tests/pt_expt/model/ — 50/50 pass (frozen, compression, serialization)
  • source/tests/pt_expt/test_change_bias.py — new tests for .pte/.pt2 model_def_script preservation
  • C++ tests — 3/3 suites pass (.pt2 models regenerated with new metadata.json)

Summary by CodeRabbit

  • Bug Fixes

    • Preserve and restore training configuration when freezing or modifying frozen models; clearer messages when training config is absent.
  • Refactor

    • Consolidated metadata layout inside exported model archives for more consistent loading across formats and runtimes.
  • Tests

    • Added and updated tests to validate config preservation, metadata delegation, and archive contents.
  • Chores

    • Extended ignore patterns to skip additional model file extensions.

Han Wang added 2 commits March 30, 2026 11:53
…te metadata files

Python DeepEval now deserializes model.json and delegates all API calls
(get_rcut, get_sel, get_dim_fparam, etc.) to the dpmodel instance instead
of reading a flat metadata dict. This is consistent with how other backends
(.dp/.yaml, .pth) work.

File layout in .pt2/.pte archives is renamed for consistency:
- metadata.json: C++ runtime metadata (was model_def_script.json)
- model_def_script.json: training config (was model_params.json, now
  matches .pth convention)
- output_keys.json removed (merged into metadata.json)
…/.pt2

Add two tests exercising the .pte/.pt2 → change_bias → .pte/.pt2
round-trip, asserting that training config (model_def_script) embedded
by freeze is preserved in the output file.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e87eaadac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 274d426c-7c22-476f-9cf0-5d89f97005ab

📥 Commits

Reviewing files that changed from the base of the PR and between b878e73 and 179a217.

📒 Files selected for processing (4)
  • deepmd/entrypoints/show.py
  • deepmd/pt_expt/utils/finetune.py
  • deepmd/pt_expt/utils/serialization.py
  • source/tests/pt_expt/test_change_bias.py
✅ Files skipped from review due to trivial changes (1)
  • deepmd/pt_expt/utils/finetune.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt_expt/test_change_bias.py

📝 Walkthrough

Walkthrough

Consolidates Pt-expt serialization: training config moved to model_def_script and metadata unified into metadata.json; freeze/change-bias now preserve model_def_script; loaders (Python/C++) and tests adjusted to read the new archive layout.

Changes

Cohort / File(s) Summary
Ignore Patterns
/.gitignore
Add *.pte and *.pt2 to ignore frozen model files.
Serialization Infrastructure
deepmd/pt_expt/utils/serialization.py
Unify extra-file layout: write metadata.json (includes output_keys) and model_def_script.json; remove model_params parameter from deserialization APIs; _collect_metadata() no longer emits model_output_type/sel_type.
Entrypoints (freeze/change-bias)
deepmd/pt_expt/entrypoints/main.py
freeze() and change_bias() include model_def_script when calling deserialize_to_file so frozen outputs preserve training config.
Inference / DeepEval
deepmd/pt_expt/infer/deep_eval.py
Load model.json via new _init_from_model_json; derive _model_output_def, rcut, type_map from deserialized model; store _model_def_script and return it from get_model_def_script(); remove reconstruction from legacy metadata.
Fine-tune utils
deepmd/pt_expt/utils/finetune.py
_load_model_params() now returns data["model_def_script"] for .pte/.pt2 inputs (no fallback to constructed model_params).
C++ API loader
source/api_cc/src/DeepPotPTExpt.cc
Read single extra/metadata.json entry (contains output_keys) instead of separate model_def_script.json/output_keys.json; update error messages and comments accordingly.
Tests — DeepEval
source/tests/pt_expt/infer/test_deep_eval.py
Adjust expectations for get_model_def_script() (empty dict when absent); add tests for round-trip model_def_script in .pte/.pt2; add API delegation tests; update .pt2 ZIP expectations.
Tests — Change-Bias
source/tests/pt_expt/test_change_bias.py
Add tests asserting model_def_script is preserved, byte-for-byte, after freezechange-bias for both .pte and .pt2 outputs.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as rgba(52,152,219,0.5)
participant Ser as rgba(46,204,113,0.5)
participant File as rgba(241,196,15,0.5)
participant PyLoader as rgba(155,89,182,0.5)
participant CppLoader as rgba(231,76,60,0.5)

CLI->>Ser: freeze()/change-bias request (model, model_def_script)
Ser->>File: write archive with `metadata.json` + `model_def_script.json` and model.json
PyLoader->>File: torch.export.load / deserialize_to_file -> read `extra/metadata.json` & `model_def_script.json`
PyLoader->>PyLoader: init DeepEval from model.json; store _model_def_script
CppLoader->>File: read `extra/metadata.json` for output_keys and metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: refactoring inference to use model API and normalizing file naming in the archive layout across .pte/.pt2 files.
Docstring Coverage ✅ Passed Docstring coverage is 82.98% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
source/tests/pt_expt/test_change_bias.py (1)

292-296: Consider adding a comment explaining the device manipulation.

The torch.set_default_device(None) / torch.set_default_device("cuda:9999999") pattern appears to work around AOTInductor compilation issues when a non-existent CUDA device is set as default. A brief comment explaining this would improve maintainability.

         # Freeze to .pt2
         pt2_path = os.path.join(self.tmpdir, "frozen_mds.pt2")
+        # Clear default device to avoid poisoning AOTInductor compilation
+        # (tests/pt/__init__.py sets it to "cuda:9999999" for CPU fallback).
         torch.set_default_device(None)

Also applies to: 306-310

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_change_bias.py` around lines 292 - 296, Add a brief
inline comment near the torch.set_default_device(None) /
torch.set_default_device("cuda:9999999") calls in the test (around the
freeze(model=self.model_path, output=pt2_path) block and the similar block at
306-310) explaining that this pattern temporarily clears the default device to
avoid AOTInductor compilation issues and then restores a non-existent CUDA
device as a deliberate sentinel; reference the freeze function and the
model_path/pt2_path variables so future readers know this is a deliberate
workaround for the compiler rather than a mistake.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 696-698: The show() function currently assumes model_params
contains keys and will KeyError when get_model_def_script() returns {}—change
the direct dictionary indexing in show() (references to
model_params["type_map"], model_params["descriptor"], and
model_params["fitting_net"]) to safe accessors like model_params.get("type_map",
{}), model_params.get("descriptor", {}), and model_params.get("fitting_net", {})
or guard with an if "key" in model_params check before use; ensure nested
attribute reads also use .get(..., {}) or default values similar to the existing
model_params.get("info", {}) usage so single-task models exported without
model_params don't raise KeyError.

---

Nitpick comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 292-296: Add a brief inline comment near the
torch.set_default_device(None) / torch.set_default_device("cuda:9999999") calls
in the test (around the freeze(model=self.model_path, output=pt2_path) block and
the similar block at 306-310) explaining that this pattern temporarily clears
the default device to avoid AOTInductor compilation issues and then restores a
non-existent CUDA device as a deliberate sentinel; reference the freeze function
and the model_path/pt2_path variables so future readers know this is a
deliberate workaround for the compiler rather than a mistake.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7a18ce25-e60a-481c-a664-50143444202d

📥 Commits

Reviewing files that changed from the base of the PR and between 0828604 and 9e87eaa.

📒 Files selected for processing (8)
  • .gitignore
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/utils/serialization.py
  • source/api_cc/src/DeepPotPTExpt.cc
  • source/tests/pt_expt/infer/test_deep_eval.py
  • source/tests/pt_expt/test_change_bias.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 88.52459% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.32%. Comparing base (0828604) to head (179a217).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/entrypoints/show.py 42.85% 4 Missing ⚠️
source/api_cc/src/DeepPotPTExpt.cc 33.33% 1 Missing and 1 partial ⚠️
deepmd/pt_expt/infer/deep_eval.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5354      +/-   ##
==========================================
- Coverage   82.32%   82.32%   -0.01%     
==========================================
  Files         809      810       +1     
  Lines       83161    83223      +62     
  Branches     4067     4067              
==========================================
+ Hits        68462    68511      +49     
- Misses      13484    13494      +10     
- Partials     1215     1218       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ize_from_file

serialize_from_file now returns training config under 'model_def_script' key
(renamed from 'model_params' in the previous commit). Update _load_model_params
in finetune.py to match.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/pt_expt/utils/finetune.py (1)

83-86: ⚠️ Potential issue | 🟡 Minor

Update stale remediation text to the new field name.

Line 85 still says model_params is embedded, but this path now depends on model_def_script. This makes failure guidance confusing.

Suggested fix
-            "re-freeze it with the latest code so that model_params is embedded."
+            "re-freeze it with the latest code so that model_def_script is embedded."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/finetune.py` around lines 83 - 86, The error message
shown when rejecting --use-pretrain-script is stale: it mentions that the
pretrained model "contains full model params" and that "model_params is
embedded"; update the message text in the finetune.py check that raises on
--use-pretrain-script so it references the new field name model_def_script
(e.g., say the finetune path depends on model_def_script or that
model_def_script must be embedded), and adjust wording to clearly instruct the
user to re-freeze with the latest code so model_def_script is embedded; keep the
rest of the error logic intact and only change the textual guidance string
associated with the --use-pretrain-script failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@deepmd/pt_expt/utils/finetune.py`:
- Around line 83-86: The error message shown when rejecting
--use-pretrain-script is stale: it mentions that the pretrained model "contains
full model params" and that "model_params is embedded"; update the message text
in the finetune.py check that raises on --use-pretrain-script so it references
the new field name model_def_script (e.g., say the finetune path depends on
model_def_script or that model_def_script must be embedded), and adjust wording
to clearly instruct the user to re-freeze with the latest code so
model_def_script is embedded; keep the rest of the error logic intact and only
change the textual guidance string associated with the --use-pretrain-script
failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4d3cb8e9-70f4-4c61-a656-60138eec1d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 9e87eaa and 5e90b19.

📒 Files selected for processing (1)
  • deepmd/pt_expt/utils/finetune.py

Han Wang added 2 commits March 30, 2026 17:23
model_def_script.json is always present in .pte/.pt2 archives, so the
fallback to {"type_map": ...} is unreachable.
Read training config from data["model_def_script"] instead of a
separate kwarg, matching the convention of pt/jax/tf/pd backends.
This fixes dp convert-backend for pt_expt, which passes the universal
dict (already containing model_def_script) with only 2 positional args.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 234-241: serialize_from_file currently always treats
extra_files["model_def_script.json"] as the training config which mis-parses
legacy archives where the real config is in "model_params.json" and
"model_def_script.json" holds runtime metadata; update serialize_from_file to
detect legacy layout by checking for presence of "model_params.json" (and/or
inspecting model_def_script contents for metadata markers), and if present load
the config from "model_params.json" (convert with _json_to_numpy as needed)
instead of model_def_script.json; if legacy archives are intentionally
unsupported, fail fast with a clear error instead of silently writing metadata
back as the training script—apply the same fix in the other reader block around
lines referencing extra_files at the second occurrence (the block covering the
other 258-266 area).
- Around line 168-177: Update the _collect_metadata() docstring to reflect the
new .pte archive layout: change the statement that .pte stored metadata in
"model_def_script.json" to indicate metadata is now stored in "metadata.json"
(and that "model_def_script.json" is reserved for the training config by
_deserialize_to_file_pte()). Ensure the docstring mentions that C++ reads flat
JSON fields from metadata.json for inference and that fitting_output_defs
remains included for reconstructing ModelOutputDef without loading the full
model; reference the _collect_metadata and _deserialize_to_file_pte symbols when
making the clarification.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4ca937c6-5d3a-421c-b54e-aeb0fa9ab421

📥 Commits

Reviewing files that changed from the base of the PR and between ff2d1ca and b878e73.

📒 Files selected for processing (3)
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/utils/serialization.py
  • source/tests/pt_expt/infer/test_deep_eval.py
✅ Files skipped from review due to trivial changes (1)
  • source/tests/pt_expt/infer/test_deep_eval.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt_expt/entrypoints/main.py

- show.py: use safe .get() for type_map/descriptor/fitting_net to avoid
  KeyError when get_model_def_script() returns {} (model frozen without
  training config); fall back to model.get_type_map() for type-map
- finetune.py: fix stale error message referencing "model_params" → "model_def_script"
- serialization.py: fix _collect_metadata() docstring (metadata.json used
  in both .pt2 and .pte, not just .pt2)
- test_change_bias.py: add comments explaining set_default_device(None) workaround
@iProzd iProzd added this pull request to the merge queue Apr 2, 2026
Merged via the queue into deepmodeling:master with commit dd8e492 Apr 2, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants