feat: Add GroupStandardScaler for scaling variables relative to a giv…#915
feat: Add GroupStandardScaler for scaling variables relative to a giv…#915ankitlade12 wants to merge 5 commits intofeature-engine:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #915 +/- ##
==========================================
+ Coverage 98.27% 98.29% +0.02%
==========================================
Files 116 117 +1
Lines 4978 5048 +70
Branches 795 806 +11
==========================================
+ Hits 4892 4962 +70
Misses 55 55
Partials 31 31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…, std edge cases, get_feature_names_out, _more_tags)
|
Hi @ankitlade12 Thanks a lot for this contribution. Would you by any chance have a reference where this type of transformation is applied? I am not sure it belongs to the scaling module. Scaling is usually done to change the scale without really affecting the overall shape of the variable distribution, but this transformation will indeed change it's shape, so it does not belong in the scaling module. It's also not a variable stabilizing transformation, so not suitable for the transformation module. I don't really know where it should be placed. if you could send a few references or share more about how/when/how frequently this is used, maybe we can take it from there? |
|
Hi @solegalli, thanks for the thoughtful feedback! References / use cases: This technique is commonly known as within-group standardization (or group-wise z-scoring) and appears frequently across several domains:
On module placement: You raise a valid point — this transformation does change the marginal distribution shape by removing between-group variation, which makes it different from a global scaler like That said, the core mechanic is still mean-centering and dividing by standard deviation — it's standardization conditioned on a grouping variable. I'd argue it's closest in spirit to scaling, but I'm open to alternatives. A few options:
I'm happy to move the class to whichever module you think is best. What's your preference? |
Description
This PR introduces the
GroupStandardScalerto thefeature_engine.scalingmodule.Currently, native scalers like
StandardScalerscale a numerical feature globally across an entire dataset. However, it is an extremely common pattern in data science to scale a feature relative to its group (e.g., standardizinghouse_pricerelative to itsneighborhood, or scaling a student'sexam_scorerelative to theirclass_id).The
GroupStandardScalerresolves this by taking bothvariablesandreferencevariables (the grouping keys). Duringfit, it learns the mean and standard deviation for each numerical variable per group. Duringtransform, it scales the variables using their respective group parameters. It gracefully handles unseen groups during transform by falling back to the global mean and standard deviation.Changes:
GroupStandardScalerclass infeature_engine/scaling/group_standard.py.GroupStandardScalerinfeature_engine/scaling/__init__.py.docs/api_doc/scaling/GroupStandardScaler.rst.docs/user_guide/scaling/GroupStandardScaler.rst.Examples:
Checklist: