[ENH] defer trailing underscore attribute assignment in fit() for imputers and discretisers#917
[ENH] defer trailing underscore attribute assignment in fit() for imputers and discretisers#917direkkakkar319-ops wants to merge 21 commits intofeature-engine:mainfrom
Conversation
…ters and discretisers (closes feature-engine#586)
|
Before the test cases were passing locally but the CI checks were failing. the 3 CI chesks that were failing were:-
As |
|
After the fix, since no trailing underscore attributes are set until the very end, a failed |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #917 +/- ##
==========================================
- Coverage 98.27% 98.25% -0.03%
==========================================
Files 116 116
Lines 4978 5033 +55
Branches 795 797 +2
==========================================
+ Hits 4892 4945 +53
- Misses 55 56 +1
- Partials 31 32 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
solegalli
left a comment
There was a problem hiding this comment.
These changes are neat! Thank you so much!
Could you finish updating all discretisers and all imputers so we can merge?
For the next PR, could you please make 1 PR per module? So, for example, 1 PR for encoders, a different PR for creation and so on?
Thanks a lot! I look forward to the changes.
|
|
||
| return X, variables_ | ||
|
|
||
| def fit(self, X: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
I can see that the logic that is here and we were calling with super().fit(), you are now not using and passing it on to the transformer, which makes sense for the requested change.
So i think we should remove the fit method from base numerical altogether. Like this, we ensure we don't use it as legacy anywhere else in the source code.
|
Hey @solegalli ,thank you for the kind words and the feedback! |
|
I'll update this PR shortly. Thanks again for the guidance |
Changes made
2 Deferred Trailing Underscore Attribute Assignment - Updated the
Files changed:-
tests |
solegalli
left a comment
There was a problem hiding this comment.
Thanks a lot for the quick turnaround!
I wasn't sure if you finished working on this PR, but I had a look anyways.
It's looking great.
We need to add tests for all the modules included in this PR. Currently, we need to add the test to the following:
- Add test for the transformers in the creation module (like this, we ensure all transformers in creation are now OK)
- Add test for the transformers in the transformation module
- Add test for the scaler
I also updated the issue, to keep track of which modules have been updated up to now.
Thank you!
|
HI, @solegalli made the 3 tests i was asked to , would appreciate you review |






Description
Closes #586
This PR ensures that all trailing underscore attributes (
variables_,imputer_dict_,binner_dict_) in imputation and discretisation transformers are only assigned after all fit logic has successfully completed, following sklearn convention.Problem
Previously, attributes like
variables_were set early infit(), before the remaining logic ran. If an error occurred midway through fitting, the transformer was left in a partially fitted state — meaningtransform()would not raiseNotFittedErroras expected.For example:
Tests
Added
test_raises_non_fitted_error_when_error_during_fittotests/test_imputation/test_check_estimator_imputers.py, following the same pattern already established intests/test_encoding/test_check_estimator_encoders.py.First attempt — 4 test cases failed
When the test was first added, it used a numerical-only DataFrame as the failure trigger for all estimators. This worked for
MeanMedianImputer,EndTailImputer, andArbitraryNumberImputer, but 4 cases did not raise becauseCategoricalImputer(ignore_format=True),AddMissingIndicator,RandomSampleImputer, andDropMissingDataaccept all variable types and fitted successfully on that DataFrame.Fix — different triggers per estimator
The test was updated to use the correct failure trigger per estimator type:
MeanMedianImputer,EndTailImputer,ArbitraryNumberImputerfind_numerical_variables()raisesTypeErrorCategoricalImputerignore_format=False+ numerical-only df → raisesTypeErrorAddMissingIndicator,RandomSampleImputer,DropMissingDatacheck_X()raisesValueErrorType of Change
All the test cases are passing locally
