Skip to content

feat(pt_expt): add missing losses (spin, DOS, tensor, property)#5345

Open
wanghan-iapcm wants to merge 3 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-losses
Open

feat(pt_expt): add missing losses (spin, DOS, tensor, property)#5345
wanghan-iapcm wants to merge 3 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-losses

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 26, 2026

Summary

  • Add EnergySpinLoss, DOSLoss, TensorLoss, PropertyLoss to dpmodel (array_api) and pt_expt (@torch_module wrapper) backends
  • Add serialize()/deserialize() to PT loss classes (ener_spin, dos, tensor, property) for cross-backend consistency testing
  • Add mae=True support to dpmodel EnergyLoss and EnergySpinLoss (extra MAE metrics for dp test)
  • Extend pt_expt get_loss() factory to handle all loss types
  • Add cross-backend consistency tests parameterized over loss_func and mae
  • Add pt_expt unit tests for all 4 new losses

Details

dpmodel losses (deepmd/dpmodel/loss/)

Array-API compatible implementations ported from deepmd/pt/loss/. Key adaptations:

  • Boolean fancy indexing replaced with mask multiplication + manual mean
  • torch.cumsumxp.cumulative_sum
  • No input dict mutation (PT mutates model_pred in-place)

pt_expt wrappers (deepmd/pt_expt/loss/)

Thin @torch_module wrappers inheriting from dpmodel classes, following existing EnergyLoss pattern.

Known limitations

  1. PropertyLoss normalization: out_std/out_bias must be provided explicitly (dpmodel losses can't access the model at forward time). Defaults to identity normalization if not provided.
  2. inference parameter: Not ported from PT losses — it only suppresses l2_* intermediate metrics and is never actually used (inference=True is never constructed anywhere in the codebase).
  3. denoise loss: Not ported (internal DPA pretraining, not user-facing).

Test plan

  • python -m pytest source/tests/pt_expt/loss/ -v — all 42 tests pass
  • python -m pytest source/tests/consistent/loss/ -v — all 342 tests pass (358 skipped for unavailable backends)
  • python -m pytest source/tests/universal/dpmodel/loss/ -v — existing tests still pass

Summary by CodeRabbit

  • New Features

    • New DOS, EnergySpin, Property, and Tensor loss types; optional MAE reporting available for losses.
    • Loss factory extended to select these loss types and wire model metadata.
    • Added serialization/deserialization for loss configurations and round-trip compatibility.
  • Integration

    • PyTorch-experimental wrappers added for the new losses.
  • Tests

    • Extensive cross-backend tests covering consistency, metrics, and serialization round-trips.
  • Chores

    • Public exports updated to include the new loss classes.

…mae support

Add EnergySpinLoss, DOSLoss, TensorLoss, PropertyLoss to dpmodel and
pt_expt backends. Add serialize/deserialize to PT losses. Add mae=True
support to dpmodel EnergyLoss and EnergySpinLoss for dp test metrics.
Extend pt_expt get_loss() factory for all loss types. Add cross-backend
consistency tests parameterized over loss_func and mae.
@wanghan-iapcm wanghan-iapcm requested a review from njzjz March 26, 2026 07:12
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 26, 2026
@dosubot dosubot bot added the new feature label Mar 26, 2026
Comment on lines +96 to +103
def call(
self,
learning_rate: float,
natoms: int,
model_dict: dict[str, Array],
label_dict: dict[str, Array],
mae: bool = False,
) -> tuple[Array, dict[str, Array]]:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at least 5 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +98 to +105
def call(
self,
learning_rate: float,
natoms: int,
model_dict: dict[str, Array],
label_dict: dict[str, Array],
mae: bool = False,
) -> tuple[Array, dict[str, Array]]:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at least 5 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +70 to +77
def call(
self,
learning_rate: float,
natoms: int,
model_dict: dict[str, Array],
label_dict: dict[str, Array],
mae: bool = False,
) -> tuple[Array, dict[str, Array]]:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at least 5 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +69 to +76
def call(
self,
learning_rate: float,
natoms: int,
model_dict: dict[str, Array],
label_dict: dict[str, Array],
mae: bool = False,
) -> tuple[Array, dict[str, Array]]:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at least 5 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
@github-actions github-actions bot added Python and removed Test CUDA Trigger test CUDA workflow labels Mar 26, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

Adds five loss implementations (DOS, EnergySpin, Property, Tensor, Energy MAE reporting), re-exports them, extends Loss.call with an optional mae flag, adds serialization for new losses, provides PyTorch-experimental wrappers and training factory wiring, and adds cross-backend tests.

Changes

Cohort / File(s) Summary
Package exports
deepmd/dpmodel/loss/__init__.py, deepmd/pt_expt/loss/__init__.py
Re-exported new loss classes (DOSLoss, EnergySpinLoss, PropertyLoss, TensorLoss) and updated __all__.
Core API
deepmd/dpmodel/loss/loss.py
Abstract Loss.call signature extended with optional mae: bool = False.
Energy loss
deepmd/dpmodel/loss/ener.py, deepmd/pt/loss/ener.py, source/tests/consistent/loss/test_ener.py, source/tests/pt_expt/loss/test_ener.py
EnergyLoss.call accepts mae and reports additional MAE metrics; tests updated to exercise MAE and pt_expt backend.
DOS loss
deepmd/dpmodel/loss/dos.py, deepmd/pt/loss/dos.py, deepmd/pt_expt/loss/dos.py, source/tests/consistent/loss/test_dos.py, source/tests/pt_expt/loss/test_dos.py
New DOSLoss implementing dos/cdf and atom-dos/cdf terms, scheduling, masking/locality scaling, RMSE outputs, serialize/deserialize, PT and pt_expt wrappers and tests.
EnergySpin loss
deepmd/dpmodel/loss/ener_spin.py, deepmd/pt/loss/ener_spin.py, deepmd/pt_expt/loss/ener_spin.py, source/tests/consistent/loss/test_ener_spin.py, source/tests/pt_expt/loss/test_ener_spin.py
New EnergySpinLoss combining energy, real/magnetic forces, virial, per-atom energy; scheduled prefactors, selectable loss func (mse/mae), optional atom-energy coeff, serialization, PT wrappers, and tests.
Property loss
deepmd/dpmodel/loss/property.py, deepmd/pt/loss/property.py, deepmd/pt_expt/loss/property.py, source/tests/consistent/loss/test_property.py, source/tests/pt_expt/loss/test_property.py
New PropertyLoss supporting multiple loss funcs (smooth_mae, mae, mse, rmse, mape), optional output normalization, metrics, serialization, PT wrappers, and tests.
Tensor loss
deepmd/dpmodel/loss/tensor.py, deepmd/pt/loss/tensor.py, deepmd/pt_expt/loss/tensor.py, source/tests/consistent/loss/test_tensor.py, source/tests/pt_expt/loss/test_tensor.py
New TensorLoss with local/global terms, optional atomic weighting, masking/normalization, RMSE outputs, serialization, PT wrappers, and tests.
Training factory
deepmd/pt_expt/train/training.py
get_loss() extended to construct dos, ener_spin, tensor, and property losses using model metadata and starter_learning_rate.
Tests
source/tests/..., source/tests/pt_expt/... (many new/updated files)
Added comprehensive cross-backend unit tests for DOS, EnergySpin, Property, Tensor losses and updated energy tests to cover mae flag and pt_expt backend integrations.

Sequence Diagram(s)

sequenceDiagram
    participant Trainer as Training Loop
    participant Model as Model
    participant Loss as Loss Module
    participant Labels as LabelDict
    participant Metrics as Metrics Dict

    Trainer->>Model: forward(input)
    Model-->>Trainer: model_dict (predictions)
    Trainer->>Loss: call(learning_rate, natoms, model_dict, Labels, mae=True)
    Loss->>Loss: select array namespace (xp/jnp/torch)
    Loss->>Loss: reshape preds & labels, apply mask if present
    Loss->>Labels: read locality / atom weights (find_* / atom_weight)
    Loss->>Loss: compute term losses (dos/cdf/atomic/global/tensor/energy/forces/virial)
    Loss->>Loss: scale terms by scheduled prefactors (coef = lr / starter_lr)
    Loss->>Loss: aggregate weighted loss and compute rmse/mae metrics
    Loss-->>Trainer: return (loss, more_loss)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the primary change: adding four missing loss classes to the pt_expt module (spin, DOS, tensor, property). The title accurately summarizes the main objective without unnecessary details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (6)
deepmd/pt/loss/property.py (1)

226-239: Normalize out_bias / out_std in the serialized payload.

These fields are returned as-is right now. If either is a torch.Tensor, the payload stops being backend-neutral and becomes awkward to persist or compare. Converting tensor-like values to plain lists here keeps the schema stable.

💡 Proposed fix
     def serialize(self) -> dict:
         """Serialize the loss module."""
+        def _as_list(value: list | torch.Tensor | None) -> list | None:
+            if value is None:
+                return None
+            return torch.as_tensor(value).detach().cpu().tolist()
+
         return {
             "@class": "PropertyLoss",
             "@version": 1,
             "task_dim": self.task_dim,
             "var_name": self.var_name,
             "loss_func": self.loss_func,
             "metric": self.metric,
             "beta": self.beta,
-            "out_bias": self.out_bias,
-            "out_std": self.out_std,
+            "out_bias": _as_list(self.out_bias),
+            "out_std": _as_list(self.out_std),
             "intensive": self.intensive,
         }

Based on learnings, serialization of dictionaries that include torch.Tensor objects should use pickle.dumps/pickle.loads instead of JSON, as JSON cannot serialize tensors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/loss/property.py` around lines 226 - 239, The serialize method of
PropertyLoss returns out_bias and out_std as-is which can be torch.Tensors and
break backend-neutrality; update PropertyLoss.serialize to detect tensor- or
array-like values for the "out_bias" and "out_std" fields and convert them to
plain Python lists (e.g., call .tolist() for torch.Tensor / numpy.ndarray, leave
None or already-serializable types unchanged) before returning the dict so the
payload remains JSON-friendly and stable across backends.
deepmd/pt_expt/train/training.py (1)

78-113: Copy/filter locally in get_loss() instead of mutating caller-owned objects.

The new branches write derived fields back into loss_params, and the tensor branch mutates the sequence returned by _model.model_output_type() with pop(). That makes the factory depend on whether those objects are shared by the caller/model. Working on local copies avoids config bleed-through and accidental model-state mutation.

💡 Proposed fix
 def get_loss(
     loss_params: dict[str, Any],
     start_lr: float,
     _ntypes: int,
     _model: Any,
 ) -> EnergyLoss:
+    loss_params = dict(loss_params)
     loss_type = loss_params.get("type", "ener")
     if loss_type == "ener":
         loss_params["starter_learning_rate"] = start_lr
         return EnergyLoss(**loss_params)
     elif loss_type == "dos":
@@
     elif loss_type == "tensor":
-        model_output_type = _model.model_output_type()
-        if "mask" in model_output_type:
-            model_output_type.pop(model_output_type.index("mask"))
+        model_output_type = [
+            name for name in _model.model_output_type() if name != "mask"
+        ]
         tensor_name = model_output_type[0]
         loss_params["tensor_size"] = _model.model_output_def()[tensor_name].output_size
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/train/training.py` around lines 78 - 113, In get_loss(), avoid
mutating caller-owned objects: make a local shallow/deep copy of loss_params
(e.g., local_params = loss_params.copy() or deepcopy when nested) and operate on
that for adding starter_learning_rate, tensor_size, label_name, tensor_name,
task_dim, var_name, intensive, etc.; likewise, don't call pop() on the sequence
returned by _model.model_output_type() — copy it into a new list (e.g.,
model_output_type = list(_model.model_output_type())) and filter out "mask" via
a non-mutating approach (list comprehension) before reading tensor_name; finally
pass the local_params to the respective constructors (EnergyLoss, DOSLoss,
EnergySpinLoss, TensorLoss, PropertyLoss).
source/tests/pt_expt/loss/test_dos.py (1)

27-54: Add at least one masked DOS fixture.

_make_data() never populates model_pred["mask"], so these tests never hit the masked DOS/CDF reductions—the part of the port that replaced boolean indexing with mask-based averaging. One masked case here would cover the riskiest branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/loss/test_dos.py` around lines 27 - 54, The tests never
exercise masked reductions because _make_data() does not set model_pred["mask"];
update _make_data to add a boolean torch.tensor model_pred["mask"] with at least
one False element so the masked DOS/atom_dos branch runs (make the mask shape
match the atom_dos axes—e.g., (nframes, natoms) or a broadcastable shape—and use
dtype=torch.bool on the same device), and if the loss code expects a
corresponding label mask include any required label mask key as well so the
masked-path is exercised for atom_dos/dos reductions.
source/tests/pt_expt/loss/test_tensor.py (1)

132-132: Prefix unused variables with underscore to satisfy linter.

The static analysis correctly identifies unused variables. Per coding guidelines, ruff check must pass.

🔧 Suggested fix
-        l_dp, more_dp = dp_loss(learning_rate, natoms, model_pred_np, label_np)
+        l_dp, _more_dp = dp_loss(learning_rate, natoms, model_pred_np, label_np)

And at line 177:

-        l0, more0 = loss0(learning_rate, natoms, model_pred, label)
+        l0, _more0 = loss0(learning_rate, natoms, model_pred, label)

Also applies to: 177-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/loss/test_tensor.py` at line 132, The linter flags
unused variables returned from dp_loss (l_dp, more_dp); update the call sites
(the dp_loss(...) invocation at the shown locations) to prefix any unused return
names with an underscore (e.g., _l_dp, _more_dp or whichever specific
variable(s) are unused at each site) so the variables remain assigned but
satisfy ruff; apply the same change to the other occurrence around line 177
where dp_loss is called.
source/tests/pt_expt/loss/test_property.py (1)

113-113: Prefix unused variable with underscore.

🔧 Suggested fix
-        l_dp, more_dp = dp_loss(learning_rate, natoms, model_pred_np, label_np)
+        l_dp, _more_dp = dp_loss(learning_rate, natoms, model_pred_np, label_np)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/loss/test_property.py` at line 113, The tuple assignment
from dp_loss currently binds l_dp and more_dp but more_dp is not used; rename
the unused variable to _more_dp (e.g., change "l_dp, more_dp = dp_loss(...)" to
"l_dp, _more_dp = dp_loss(...)") so the intent is clear and linters won't flag
it; update the assignment in test_property.py where dp_loss is called and ensure
no other references expect "more_dp".
deepmd/dpmodel/loss/property.py (1)

47-58: Fail fast on invalid PropertyLoss configs.

A typo in loss_func, beta <= 0 for smooth_mae, or any non-positive out_std only blows up once call() hits the divisions at Lines 104-105 and Lines 110-113. Rejecting those cases in __init__ would turn late NaNs into immediate config errors.

Example guardrails
     ) -> None:
+        valid_loss_funcs = {"smooth_mae", "mae", "mse", "rmse", "mape"}
+        if loss_func not in valid_loss_funcs:
+            raise ValueError(
+                f"Invalid loss_func '{loss_func}'. Must be one of {sorted(valid_loss_funcs)}."
+            )
+        if loss_func == "smooth_mae" and beta <= 0:
+            raise ValueError("beta must be > 0 for smooth_mae")
+        if out_std is not None and any(v <= 0 for v in out_std):
+            raise ValueError("out_std must contain only positive values")
         if metric is None:
             metric = ["mae"]

Also applies to: 92-105, 108-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/loss/property.py` around lines 47 - 58, In
PropertyLoss.__init__, validate config arguments up front instead of letting
call() produce NaNs: check that loss_func is one of the supported strings used
by call (e.g., "smooth_mae", "mse", etc.), if loss_func == "smooth_mae" enforce
beta > 0, and validate out_std (and any provided out_bias) entries are all
positive non-zero numbers; if any check fails raise a ValueError with a clear
message. Also normalize/convert metric if needed and store validated values on
the instance so call() can safely assume loss_func, beta, out_std/out_bias are
valid and non-zero. Ensure you reference the same names used in the class
(PropertyLoss.__init__, PropertyLoss.call, loss_func, beta, out_std, out_bias)
so these guards are easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/loss/dos.py`:
- Around line 150-157: Update pyproject.toml to require array-api-compat >= 1.9
so the Array API cumulative_sum used in dpmodel.loss.dos (see usage:
xp.cumulative_sum in the cumulative_sum calls that reshape
model_dict["atom_dos"] and label_dict["atom_dos"]) is available at runtime;
change the dependency entry for "array-api-compat" to "array-api-compat >= 1.9"
(or equivalent spec) and run dependency install/lock to pick up the new version.

In `@deepmd/dpmodel/loss/ener_spin.py`:
- Line 255: The code currently computes more_loss["rmse"] = xp.sqrt(loss)
regardless of loss type, which mislabels an L1/MAE aggregate as RMSE; in the
function handling loss aggregation (use references to loss_func, loss, xp.sqrt
and more_loss["rmse"]) only compute and set more_loss["rmse"] when loss_func
indicates an MSE-based loss (e.g., "mse") and otherwise either omit the "rmse"
key or set a neutral/explicit MAE aggregate key (e.g., more_loss["mae"] or
more_loss["aggregate"]) so logs and dp test report the correct metric. Ensure
the gating condition checks loss_func before calling xp.sqrt and that downstream
consumers expect the alternative key when in MAE mode.
- Around line 176-206: Default find_force_mag to 0.0 and short-circuit the
magnetic-force branch when the effective weight or valid counts are zero: change
the default assignment of find_force_mag to 0.0 and after pref_fm = pref_fm *
find_force_mag, check if pref_fm == 0.0 or n_valid == 0 and return/skip all FM
reductions; in the MAE branch guard the per-frame division by per_frame_count_fm
by either filtering out frames with zero count (e.g. only sum per_frame_sum_fm /
per_frame_count_fm for frames where per_frame_count_fm>0) or skip the term if no
valid frames, and only then compute and add l2_force_mag_loss /
l1_force_mag_loss to loss and set entries in more_loss (affecting symbols:
find_force_mag, pref_fm, mask_mag, n_valid, l2_force_mag_loss,
l1_force_mag_loss, per_frame_count_fm, more_loss, display_if_exist).

In `@deepmd/dpmodel/loss/tensor.py`:
- Around line 173-183: The label key names are inconsistent: label_requirement
builds a DataRequirementItem with key "atomic_weight" but the call() method
reads label_dict["atom_weight"]; make them match by choosing one canonical key
and updating the other. Edit the DataRequirementItem entry in label_requirement
or the access in call() so both use the same string (referencing the
DataRequirementItem creation for "atomic_weight" and the call() lookup of
"atom_weight"), and ensure any downstream uses/defaults (e.g., default=1.0)
remain correct.
- Around line 149-162: The label_requirement property currently registers the
wrong prefix ("atomic_"+self.label_name) causing a mismatch with the call()
method which expects "atom_"+self.label_name; update the DataRequirementItem key
in label_requirement to use "atom_"+self.label_name (keep the other fields and
condition on self.has_local_weight, and leave label_name, tensor_size and the
DataRequirementItem construction unchanged) so the label key matches what call()
(and other loss classes like ener.py, dos.py) expects.

In `@deepmd/pt/loss/dos.py`:
- Around line 271-286: The serialize() for DOSLoss is missing the constructor
flag "inference", causing inference-only instances to lose that state on
round-trip; update DOSLoss.serialize to include "inference": self.inference (and
likewise add the same field in the adjacent serialize block at 288-294 if
present) so that deserialization/.__init__() receives the original inference
boolean and preserves which terms are active when reloading.

In `@deepmd/pt/loss/ener_spin.py`:
- Around line 412-430: The serialize() method in the EnergySpinLoss class does
not include the inference flag in its output dictionary, causing deserialized
objects with inference=True to lose this state and default back to
inference=False. Add the inference field to the dictionary returned by
serialize(), but only include it when the value is True to maintain backward
compatibility with existing serialized configs that use the default training
mode. Make sure to also apply this same fix to the serialize() method in
EnergySpinLoss (also mentioned at lines 432-438).

In `@deepmd/pt/loss/tensor.py`:
- Around line 212-231: serialize() for TensorLoss omits the inference flag so
deserialized instances default to False; update TensorLoss.serialize to include
the inference attribute (e.g., "inference": self.inference) and ensure
TensorLoss.deserialize accepts and passes that field through (preserve it in
data.copy() handling and cls(**data) usage) so the inference state round-trips
and assertions relying on has_local_weight/has_global_weight remain valid.

In `@source/tests/consistent/loss/test_ener.py`:
- Line 94: The tuple unpacking in the test (the assignment of self.param to
use_huber, enable_atom_ener_coeff, loss_func, f_use_norm, _mae) defines
variables that are not used; rename those unused slots by prefixing them with an
underscore (e.g., _use_huber, _enable_atom_ener_coeff) so ruff no longer flags
them, and apply the same change to the other unpackings in this file where
enable_atom_ener_coeff is unused (replace use_huber/enable_atom_ener_coeff with
_use_huber/_enable_atom_ener_coeff in the self.param unpack statements).
- Line 96: The test skip condition must also skip TF/Paddle when mae cases are
used; update the boolean expression (the return line shown) to include the mae
flag (e.g., return CommonTest.skip_tf or loss_func == "mae" or f_use_norm or
mae) so TF/Paddle runs are avoided for mae=True; ensure this aligns with how
TestEnerGF sets mae and that extract_ret()/CommonTest._compare_ret() remain
consistent with build_tf()/eval_pd() behavior.

In `@source/tests/pt_expt/loss/test_dos.py`:
- Line 125: The unused local variables returned from dp_loss are causing RUF059;
rename the unused bindings more_dp and more0 to _more_dp (or simply _ ) to mark
them as intentionally unused. Update the assignment statements that call dp_loss
(the one assigning l_dp, more_dp and the one assigning l_dp, more0) so the
second tuple element is renamed, and run ruff check/format to verify the warning
is resolved.

In `@source/tests/pt_expt/loss/test_ener_spin.py`:
- Line 144: The test assigns unused return values to locals that trigger Ruff
RUF059; rename the unused locals returned from dp_loss and any other unused
locals (e.g., more_dp and more0) to a conventional unused name (e.g., _more_dp
or simply _ ) so they are intentionally ignored and Ruff stops flagging them;
update the call site using dp_loss (and the other occurrence around the second
reported location) to use the new unused-name variables and run ruff check/ruff
format before committing.

---

Nitpick comments:
In `@deepmd/dpmodel/loss/property.py`:
- Around line 47-58: In PropertyLoss.__init__, validate config arguments up
front instead of letting call() produce NaNs: check that loss_func is one of the
supported strings used by call (e.g., "smooth_mae", "mse", etc.), if loss_func
== "smooth_mae" enforce beta > 0, and validate out_std (and any provided
out_bias) entries are all positive non-zero numbers; if any check fails raise a
ValueError with a clear message. Also normalize/convert metric if needed and
store validated values on the instance so call() can safely assume loss_func,
beta, out_std/out_bias are valid and non-zero. Ensure you reference the same
names used in the class (PropertyLoss.__init__, PropertyLoss.call, loss_func,
beta, out_std, out_bias) so these guards are easy to locate.

In `@deepmd/pt_expt/train/training.py`:
- Around line 78-113: In get_loss(), avoid mutating caller-owned objects: make a
local shallow/deep copy of loss_params (e.g., local_params = loss_params.copy()
or deepcopy when nested) and operate on that for adding starter_learning_rate,
tensor_size, label_name, tensor_name, task_dim, var_name, intensive, etc.;
likewise, don't call pop() on the sequence returned by
_model.model_output_type() — copy it into a new list (e.g., model_output_type =
list(_model.model_output_type())) and filter out "mask" via a non-mutating
approach (list comprehension) before reading tensor_name; finally pass the
local_params to the respective constructors (EnergyLoss, DOSLoss,
EnergySpinLoss, TensorLoss, PropertyLoss).

In `@deepmd/pt/loss/property.py`:
- Around line 226-239: The serialize method of PropertyLoss returns out_bias and
out_std as-is which can be torch.Tensors and break backend-neutrality; update
PropertyLoss.serialize to detect tensor- or array-like values for the "out_bias"
and "out_std" fields and convert them to plain Python lists (e.g., call
.tolist() for torch.Tensor / numpy.ndarray, leave None or already-serializable
types unchanged) before returning the dict so the payload remains JSON-friendly
and stable across backends.

In `@source/tests/pt_expt/loss/test_dos.py`:
- Around line 27-54: The tests never exercise masked reductions because
_make_data() does not set model_pred["mask"]; update _make_data to add a boolean
torch.tensor model_pred["mask"] with at least one False element so the masked
DOS/atom_dos branch runs (make the mask shape match the atom_dos axes—e.g.,
(nframes, natoms) or a broadcastable shape—and use dtype=torch.bool on the same
device), and if the loss code expects a corresponding label mask include any
required label mask key as well so the masked-path is exercised for atom_dos/dos
reductions.

In `@source/tests/pt_expt/loss/test_property.py`:
- Line 113: The tuple assignment from dp_loss currently binds l_dp and more_dp
but more_dp is not used; rename the unused variable to _more_dp (e.g., change
"l_dp, more_dp = dp_loss(...)" to "l_dp, _more_dp = dp_loss(...)") so the intent
is clear and linters won't flag it; update the assignment in test_property.py
where dp_loss is called and ensure no other references expect "more_dp".

In `@source/tests/pt_expt/loss/test_tensor.py`:
- Line 132: The linter flags unused variables returned from dp_loss (l_dp,
more_dp); update the call sites (the dp_loss(...) invocation at the shown
locations) to prefix any unused return names with an underscore (e.g., _l_dp,
_more_dp or whichever specific variable(s) are unused at each site) so the
variables remain assigned but satisfy ruff; apply the same change to the other
occurrence around line 177 where dp_loss is called.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7228a7bd-845d-4205-866b-dab0daaa93f0

📥 Commits

Reviewing files that changed from the base of the PR and between e97967b and aff5d11.

📒 Files selected for processing (26)
  • deepmd/dpmodel/loss/__init__.py
  • deepmd/dpmodel/loss/dos.py
  • deepmd/dpmodel/loss/ener.py
  • deepmd/dpmodel/loss/ener_spin.py
  • deepmd/dpmodel/loss/loss.py
  • deepmd/dpmodel/loss/property.py
  • deepmd/dpmodel/loss/tensor.py
  • deepmd/pt/loss/dos.py
  • deepmd/pt/loss/ener_spin.py
  • deepmd/pt/loss/property.py
  • deepmd/pt/loss/tensor.py
  • deepmd/pt_expt/loss/__init__.py
  • deepmd/pt_expt/loss/dos.py
  • deepmd/pt_expt/loss/ener_spin.py
  • deepmd/pt_expt/loss/property.py
  • deepmd/pt_expt/loss/tensor.py
  • deepmd/pt_expt/train/training.py
  • source/tests/consistent/loss/test_dos.py
  • source/tests/consistent/loss/test_ener.py
  • source/tests/consistent/loss/test_ener_spin.py
  • source/tests/consistent/loss/test_property.py
  • source/tests/consistent/loss/test_tensor.py
  • source/tests/pt_expt/loss/test_dos.py
  • source/tests/pt_expt/loss/test_ener_spin.py
  • source/tests/pt_expt/loss/test_property.py
  • source/tests/pt_expt/loss/test_tensor.py

@codecov
Copy link

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 82.99065% with 91 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.35%. Comparing base (e97967b) to head (fb2027b).

Files with missing lines Patch % Lines
deepmd/pt_expt/train/training.py 0.00% 26 Missing ⚠️
deepmd/dpmodel/loss/dos.py 80.00% 22 Missing ⚠️
deepmd/dpmodel/loss/ener_spin.py 88.69% 19 Missing ⚠️
deepmd/dpmodel/loss/tensor.py 79.22% 16 Missing ⚠️
deepmd/dpmodel/loss/property.py 89.61% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5345      +/-   ##
==========================================
+ Coverage   82.28%   82.35%   +0.07%     
==========================================
  Files         797      806       +9     
  Lines       82100    82635     +535     
  Branches     4003     4003              
==========================================
+ Hits        67557    68058     +501     
- Misses      13336    13369      +33     
- Partials     1207     1208       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The eval_pd methods in TestEner and TestEnerGF were missing the mae
parameter, causing key mismatch when comparing Paddle output against
the reference (dpmodel) which includes mae_* keys.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
source/tests/consistent/loss/test_ener.py (3)

100-100: ⚠️ Potential issue | 🟠 Major

Use _-prefixed names for the still-unused tuple slots.

skip_pd() never reads use_huber / enable_atom_ener_coeff, and setUp() never reads enable_atom_ener_coeff. Ruff is already flagging these unpackings.

💡 Proposed fix
-        (use_huber, enable_atom_ener_coeff, loss_func, f_use_norm, _mae) = self.param
+        (_use_huber, _enable_atom_ener_coeff, loss_func, f_use_norm, _mae) = self.param
...
-        (use_huber, enable_atom_ener_coeff, loss_func, f_use_norm, mae) = self.param
+        (use_huber, _enable_atom_ener_coeff, loss_func, f_use_norm, mae) = self.param
As per coding guidelines, `**/*.py`: Always run `ruff check .` and `ruff format .` before committing changes or CI will fail.

Also applies to: 119-119

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/consistent/loss/test_ener.py` at line 100, The tuple unpacking
in the test (the line assigning (use_huber, enable_atom_ener_coeff, loss_func,
f_use_norm, _mae) = self.param) exposes unused names that Ruff flags; change
unused slots to underscore-prefixed names (for example _use_huber and
_enable_atom_ener_coeff or simply _ , as appropriate) so only actually-used
variables (loss_func, f_use_norm, _mae) remain named; update the same pattern at
the other occurrence around line 119, and run ruff check . and ruff format .
before committing.

94-96: ⚠️ Potential issue | 🟠 Major

Skip TF when the extra mae_* metrics are enabled.

The non-TF paths now forward mae=self.mae, but build_tf() still cannot request those metrics. On TF-enabled runs this turns the consistency check into a key-set mismatch. Add or mae here; Paddle is already covered because eval_pd() now passes mae.

💡 Proposed fix
-        (use_huber, enable_atom_ener_coeff, loss_func, f_use_norm, _mae) = self.param
+        (_use_huber, _enable_atom_ener_coeff, loss_func, f_use_norm, mae) = self.param
         # Skip TF for MAE loss tests (not implemented in TF backend)
-        return CommonTest.skip_tf or loss_func == "mae" or f_use_norm
+        return CommonTest.skip_tf or loss_func == "mae" or f_use_norm or mae
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/consistent/loss/test_ener.py` around lines 94 - 96, The TF-skip
condition needs to also skip when extra MAE metrics are requested: update the
return expression in the test where (use_huber, enable_atom_ener_coeff,
loss_func, f_use_norm, _mae) = self.param (and where CommonTest.skip_tf is
checked) to include the mae flag (e.g., add "or mae" / "or self.mae" to the
predicate), so the clause becomes CommonTest.skip_tf or loss_func == "mae" or
f_use_norm or mae to avoid key-set mismatches with TF backends.

355-355: ⚠️ Potential issue | 🟠 Major

TestEnerGF also needs to skip TF now that every comparable backend forces mae=True.

extract_ret() now compares mae_*, and each non-TF evaluator in this class opts into those metrics. build_tf() still has no matching path, so TF-enabled environments will compare different result keys unless this class skips TF entirely or wires mae=True through the TF backend too.

💡 Proposed fix
-    skip_tf = CommonTest.skip_tf
+    skip_tf = True  # TF loss path does not emit mae_* metrics

Also applies to: 457-457, 469-469, 497-497, 515-515, 535-535, 547-547

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/consistent/loss/test_ener.py` at line 355, TestEnerGF must skip
TensorFlow runs because extract_ret() now expects mae_* keys that build_tf()
does not emit; update the test class to skip TF by adding or setting a TF-skip
flag (e.g., set skip_tf_expt = True or set skip_tf_expt = not INSTALLED_TF_EXPT)
alongside the existing skip_pt_expt assignment in TestEnerGF so that TF backends
are not executed; refer to TestEnerGF, extract_ret(), and build_tf() when making
the change and apply the same TF-skip update at the other occurrences mentioned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@source/tests/consistent/loss/test_ener.py`:
- Line 100: The tuple unpacking in the test (the line assigning (use_huber,
enable_atom_ener_coeff, loss_func, f_use_norm, _mae) = self.param) exposes
unused names that Ruff flags; change unused slots to underscore-prefixed names
(for example _use_huber and _enable_atom_ener_coeff or simply _ , as
appropriate) so only actually-used variables (loss_func, f_use_norm, _mae)
remain named; update the same pattern at the other occurrence around line 119,
and run ruff check . and ruff format . before committing.
- Around line 94-96: The TF-skip condition needs to also skip when extra MAE
metrics are requested: update the return expression in the test where
(use_huber, enable_atom_ener_coeff, loss_func, f_use_norm, _mae) = self.param
(and where CommonTest.skip_tf is checked) to include the mae flag (e.g., add "or
mae" / "or self.mae" to the predicate), so the clause becomes CommonTest.skip_tf
or loss_func == "mae" or f_use_norm or mae to avoid key-set mismatches with TF
backends.
- Line 355: TestEnerGF must skip TensorFlow runs because extract_ret() now
expects mae_* keys that build_tf() does not emit; update the test class to skip
TF by adding or setting a TF-skip flag (e.g., set skip_tf_expt = True or set
skip_tf_expt = not INSTALLED_TF_EXPT) alongside the existing skip_pt_expt
assignment in TestEnerGF so that TF backends are not executed; refer to
TestEnerGF, extract_ret(), and build_tf() when making the change and apply the
same TF-skip update at the other occurrences mentioned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5aab57a0-c401-4db3-8577-85db0d5538e3

📥 Commits

Reviewing files that changed from the base of the PR and between aff5d11 and ca8418b.

📒 Files selected for processing (1)
  • source/tests/consistent/loss/test_ener.py

- Change find_* label defaults from 1.0 to 0.0 in dpmodel losses
  (ener_spin, dos, tensor) to match PT behavior: missing labels should
  zero out the contribution, not assume they exist.
- Prefix unused unpacked variables with _ to satisfy ruff RUF059.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
deepmd/dpmodel/loss/tensor.py (1)

149-183: ⚠️ Potential issue | 🔴 Critical

Align label_requirement keys with call() lookups (atom_*).

Line [156] and Line [176] use atomic_* keys, but call() reads atom_* keys at Line [93] and Line [83]. This mismatch can suppress local loss unexpectedly and break atomic-weighted execution.

🐛 Proposed fix
         if self.has_local_weight:
             label_requirement.append(
                 DataRequirementItem(
-                    "atomic_" + self.label_name,
+                    "atom_" + self.label_name,
                     ndof=self.tensor_size,
                     atomic=True,
                     must=False,
                     high_prec=False,
                 )
             )
@@
         if self.enable_atomic_weight:
             label_requirement.append(
                 DataRequirementItem(
-                    "atomic_weight",
+                    "atom_weight",
                     ndof=1,
                     atomic=True,
                     must=False,
                     high_prec=False,
                     default=1.0,
                 )
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/loss/tensor.py` around lines 149 - 183, label_requirement
currently registers keys with the "atomic_" prefix ("atomic_"+self.label_name
and "atomic_weight") but call() expects "atom_*" keys; update the
DataRequirementItem keys in the label_requirement property so they use
"atom_"+self.label_name for the local label entry and "atom_weight" for the
atomic weight entry (leave other fields the same) so the entries created by
label_requirement match the lookups performed by call(); reference symbols:
label_requirement, call(), DataRequirementItem, self.label_name,
has_local_weight, enable_atomic_weight.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/loss/ener_spin.py`:
- Around line 137-148: The code overwrites more_loss["mae_e"] with a differently
normalized value when loss_func == "mae" and mae is True; to fix, ensure
consistent MAE reporting by either not reassigning more_loss["mae_e"] in the if
mae: block when loss_func == "mae" (add guard: if not (self.loss_func == "mae")
before setting more_loss["mae_e"]) or compute a single base value (e.g.,
base_mae = xp.mean(abs_diff_e) or xp.mean(xp.abs(energy_pred - energy_label))
once) and reuse it to set both more_loss["mae_e"] (with or without atom_norm
consistently) and more_loss["mae_e_all"]; reference symbols: self.loss_func,
mae, more_loss["mae_e"], abs_diff_e, atom_norm, and display_if_exist to locate
and adjust the assignments.

In `@deepmd/dpmodel/loss/tensor.py`:
- Line 1: Run ruff locally and fix all lint issues in
deepmd/dpmodel/loss/tensor.py and other Python files: execute `ruff check .` to
list violations, run `ruff format .` to auto-fix formatting, then manually
correct remaining offenses (e.g., unused imports, line-length, naming, or type
issues) in the reported functions/classes in this module and others, and finally
re-run `ruff check .` and `ruff format .` to ensure no violations remain before
committing.

---

Duplicate comments:
In `@deepmd/dpmodel/loss/tensor.py`:
- Around line 149-183: label_requirement currently registers keys with the
"atomic_" prefix ("atomic_"+self.label_name and "atomic_weight") but call()
expects "atom_*" keys; update the DataRequirementItem keys in the
label_requirement property so they use "atom_"+self.label_name for the local
label entry and "atom_weight" for the atomic weight entry (leave other fields
the same) so the entries created by label_requirement match the lookups
performed by call(); reference symbols: label_requirement, call(),
DataRequirementItem, self.label_name, has_local_weight, enable_atomic_weight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c951210d-cc65-4df2-963e-3474242d33f2

📥 Commits

Reviewing files that changed from the base of the PR and between ca8418b and fb2027b.

📒 Files selected for processing (9)
  • deepmd/dpmodel/loss/dos.py
  • deepmd/dpmodel/loss/ener_spin.py
  • deepmd/dpmodel/loss/tensor.py
  • source/tests/consistent/loss/test_ener.py
  • source/tests/pt_expt/loss/test_dos.py
  • source/tests/pt_expt/loss/test_ener.py
  • source/tests/pt_expt/loss/test_ener_spin.py
  • source/tests/pt_expt/loss/test_property.py
  • source/tests/pt_expt/loss/test_tensor.py
✅ Files skipped from review due to trivial changes (3)
  • source/tests/pt_expt/loss/test_ener.py
  • source/tests/pt_expt/loss/test_dos.py
  • source/tests/pt_expt/loss/test_property.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • source/tests/consistent/loss/test_ener.py
  • source/tests/pt_expt/loss/test_tensor.py
  • deepmd/dpmodel/loss/dos.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant