feat: Add GroupStandardScaler for scaling variables relative to a giv… by ankitlade12 · Pull Request #915 · feature-engine/feature_engine

ankitlade12 · 2026-03-10T16:00:55Z

Description

This PR introduces the GroupStandardScaler to the feature_engine.scaling module.

Currently, native scalers like StandardScaler scale a numerical feature globally across an entire dataset. However, it is an extremely common pattern in data science to scale a feature relative to its group (e.g., standardizing house_price relative to its neighborhood, or scaling a student's exam_score relative to their class_id).

The GroupStandardScaler resolves this by taking both variables and reference variables (the grouping keys). During fit, it learns the mean and standard deviation for each numerical variable per group. During transform, it scales the variables using their respective group parameters. It gracefully handles unseen groups during transform by falling back to the global mean and standard deviation.

Changes:

Added GroupStandardScaler class in feature_engine/scaling/group_standard.py.
Exported GroupStandardScaler in feature_engine/scaling/__init__.py.
Included rigorous tests for single-reference scaling, missing values handling, unseen groups fallback, and parameter validation.
Added full API documentation in docs/api_doc/scaling/GroupStandardScaler.rst.
Added User Guide explanations and examples in docs/user_guide/scaling/GroupStandardScaler.rst.

Examples:

import pandas as pd
from feature_engine.scaling import GroupStandardScaler

df = pd.DataFrame({
    "House_Price": [100000, 150000, 120000, 500000, 550000, 480000],
    "Neighborhood": ["A", "A", "A", "B", "B", "B"]
})

scaler = GroupStandardScaler(
    variables=["House_Price"],
    reference=["Neighborhood"]
)

scaler.fit(df)
df_scaled = scaler.transform(df)

Checklist:

I have read the contribution guidelines.
I have tested my code locally.
I have added documentation for my new feature.
I have added unit tests for my changes.

…en group

codecov · 2026-03-11T21:37:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.29%. Comparing base (f72a2b7) to head (0b266f8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #915      +/-   ##
==========================================
+ Coverage   98.27%   98.29%   +0.02%     
==========================================
  Files         116      117       +1     
  Lines        4978     5048      +70     
  Branches      795      806      +11     
==========================================
+ Hits         4892     4962      +70     
  Misses         55       55              
  Partials       31       31

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…, std edge cases, get_feature_names_out, _more_tags)

solegalli · 2026-03-26T17:17:00Z

Hi @ankitlade12

Thanks a lot for this contribution. Would you by any chance have a reference where this type of transformation is applied?

I am not sure it belongs to the scaling module. Scaling is usually done to change the scale without really affecting the overall shape of the variable distribution, but this transformation will indeed change it's shape, so it does not belong in the scaling module.

It's also not a variable stabilizing transformation, so not suitable for the transformation module. I don't really know where it should be placed.

if you could send a few references or share more about how/when/how frequently this is used, maybe we can take it from there?

ankitlade12 · 2026-03-27T05:43:21Z

Hi @solegalli, thanks for the thoughtful feedback!

References / use cases:

This technique is commonly known as within-group standardization (or group-wise z-scoring) and appears frequently across several domains:

Econometrics & panel data: Within-group centering/scaling is a standard preprocessing step in fixed-effects and multilevel/hierarchical models to separate within-group variation from between-group variation. See Gelman & Hill, Data Analysis Using Regression and Multilevel Models (Ch. 12–13) for discussion of centering predictors within groups.
Education & psychometrics: Standardizing student scores relative to their school, cohort, or test form is routine practice to make scores comparable across groups (e.g., equating exam difficulty across sessions).
Sports analytics: Player performance metrics are regularly standardized relative to position or league to enable fair cross-group comparisons.
Healthcare / clinical trials: Lab values are often standardized relative to site or demographic group to account for systematic between-group differences before modeling.
General ML pipelines: Any time you have grouped/hierarchical data and want to remove between-group scale differences before feeding features into a model, this is the standard approach. It's the preprocessing counterpart to scikit-learn's GroupKFold.

On module placement:

You raise a valid point — this transformation does change the marginal distribution shape by removing between-group variation, which makes it different from a global scaler like StandardScaler.

That said, the core mechanic is still mean-centering and dividing by standard deviation — it's standardization conditioned on a grouping variable. I'd argue it's closest in spirit to scaling, but I'm open to alternatives. A few options:

Keep it in scaling — the operation is standardization, just group-conditional. Users looking for "scaling by group" would naturally look here.
A new submodule like feature_engine.group_transforms — if you anticipate other group-conditional operations (group-wise min-max, group-wise robust scaling, etc.), this could be a clean home.
Place it in transformation — though I agree it's not a stabilizing transformation, so this feels like a weaker fit.

I'm happy to move the class to whichever module you think is best. What's your preference?

ankitlade12 added 4 commits March 10, 2026 10:59

feat: Add GroupStandardScaler for scaling variables relative to a giv…

18a683b

…en group

fix: Resolve flake8 spacing and mypy type hint issues

50cffb1

fix: Resolve flake8 E501 line length issue

9abffa4

chore: trigger CI rerun

c7444ef

test: add coverage for GroupStandardScaler (variables=None, multi-ref…

0b266f8

…, std edge cases, get_feature_names_out, _more_tags)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add GroupStandardScaler for scaling variables relative to a giv…#915

feat: Add GroupStandardScaler for scaling variables relative to a giv…#915
ankitlade12 wants to merge 5 commits intofeature-engine:mainfrom
ankitlade12:feat/group-standard-scaler

ankitlade12 commented Mar 10, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

solegalli commented Mar 26, 2026

Uh oh!

ankitlade12 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ankitlade12 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes:

Examples:

Checklist:

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

solegalli commented Mar 26, 2026

Uh oh!

ankitlade12 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ankitlade12 commented Mar 10, 2026 •

edited

Loading

codecov bot commented Mar 11, 2026 •

edited

Loading