Skip to content

[ENH] defer trailing underscore attribute assignment in fit() for imputers and discretisers#917

Open
direkkakkar319-ops wants to merge 21 commits intofeature-engine:mainfrom
direkkakkar319-ops:issue#586
Open

[ENH] defer trailing underscore attribute assignment in fit() for imputers and discretisers#917
direkkakkar319-ops wants to merge 21 commits intofeature-engine:mainfrom
direkkakkar319-ops:issue#586

Conversation

@direkkakkar319-ops
Copy link
Copy Markdown

@direkkakkar319-ops direkkakkar319-ops commented Mar 21, 2026

Description

Closes #586

This PR ensures that all trailing underscore attributes (variables_, imputer_dict_, binner_dict_) in imputation and discretisation transformers are only assigned after all fit logic has successfully completed, following sklearn convention.

Problem

Previously, attributes like variables_ were set early in fit(), before the remaining logic ran. If an error occurred midway through fitting, the transformer was left in a partially fitted state — meaning transform() would not raise NotFittedError as expected.

For example:

def fit(self, X, y=None):
    self.variables_ = find_numerical_variables(X)  # ← set too early
    self.imputer_dict_ = X[self.variables_].mean()  # ← if this fails, variables_ already set

Tests

Added test_raises_non_fitted_error_when_error_during_fit to tests/test_imputation/test_check_estimator_imputers.py, following the same pattern already established in tests/test_encoding/test_check_estimator_encoders.py.

First attempt — 4 test cases failed

When the test was first added, it used a numerical-only DataFrame as the failure trigger for all estimators. This worked for MeanMedianImputer, EndTailImputer, and ArbitraryNumberImputer, but 4 cases did not raise because CategoricalImputer(ignore_format=True), AddMissingIndicator, RandomSampleImputer, and DropMissingData accept all variable types and fitted successfully on that DataFrame.

Screenshot 2026-03-22 000733

Fix — different triggers per estimator

The test was updated to use the correct failure trigger per estimator type:

Estimator Trigger
MeanMedianImputer, EndTailImputer, ArbitraryNumberImputer Categorical-only df → find_numerical_variables() raises TypeError
CategoricalImputer Reset to ignore_format=False + numerical-only df → raises TypeError
AddMissingIndicator, RandomSampleImputer, DropMissingData Empty df → check_X() raises ValueError

Type of Change

  • Bug fix
  • New feature (non-breaking)
  • Breaking change
  • Documentation update

All the test cases are passing locally
Screenshot 2026-03-22 001213

@direkkakkar319-ops direkkakkar319-ops changed the title added test case `def test_raises_non_fitted_error_when_error_during_f… fix: ensure trailing underscore attributes are set only after fit() succeeds in imputers and discretisers Mar 22, 2026
@direkkakkar319-ops direkkakkar319-ops changed the title fix: ensure trailing underscore attributes are set only after fit() succeeds in imputers and discretisers fix: defer trailing underscore attribute assignment in fit() for imputers and discretisers (closes #586) Mar 22, 2026
@direkkakkar319-ops
Copy link
Copy Markdown
Author

direkkakkar319-ops commented Mar 22, 2026

Before the test cases were passing locally but the CI checks were failing.

the 3 CI chesks that were failing were:-

  1. test_failure_engine_py39
  2. test_feature-engine_py311_sklearn160
  3. est_feature-engine_py312_pandas300

As self.variables_ was set early in fit(). On older sklearn versions, check_is_fitted() found it and considered the transformer fitted even though fit() had failed — so transform()` didn't raise NotFittedError as expected.
This was the reason for the 2 CI checks failing

@direkkakkar319-ops
Copy link
Copy Markdown
Author

After the fix, since no trailing underscore attributes are set until the very end, a failed fit() leaves the object completely clean,check_is_fitted() finds nothing and correctly raises ERROR:NotFittedError

@direkkakkar319-ops
Copy link
Copy Markdown
Author

I haved also ran tests locally, all of them are passing
image

@direkkakkar319-ops direkkakkar319-ops marked this pull request as ready for review March 22, 2026 10:06
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 98.73418% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.25%. Comparing base (f72a2b7) to head (0c681f5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
feature_engine/transformation/log.py 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #917      +/-   ##
==========================================
- Coverage   98.27%   98.25%   -0.03%     
==========================================
  Files         116      116              
  Lines        4978     5033      +55     
  Branches      795      797       +2     
==========================================
+ Hits         4892     4945      +53     
- Misses         55       56       +1     
- Partials       31       32       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @direkkakkar319-ops

These changes are neat! Thank you so much!

Could you finish updating all discretisers and all imputers so we can merge?

For the next PR, could you please make 1 PR per module? So, for example, 1 PR for encoders, a different PR for creation and so on?

Thanks a lot! I look forward to the changes.


return X, variables_

def fit(self, X: pd.DataFrame) -> pd.DataFrame:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the logic that is here and we were calling with super().fit(), you are now not using and passing it on to the transformer, which makes sense for the requested change.

So i think we should remove the fit method from base numerical altogether. Like this, we ensure we don't use it as legacy anywhere else in the source code.

@direkkakkar319-ops
Copy link
Copy Markdown
Author

Hey @solegalli ,thank you for the kind words and the feedback!
I'll finish updating all the discretisers and imputers right away so we can get this merged.

@direkkakkar319-ops
Copy link
Copy Markdown
Author

I'll update this PR shortly. Thanks again for the guidance

@direkkakkar319-ops
Copy link
Copy Markdown
Author

Changes made

  1. Removed fit() from the Base Class . - The fit() method was completely removed from base_numerical.py .

2 Deferred Trailing Underscore Attribute Assignment - Updated the fit() method in all imputer and discretiser classes to use local variables for internal calculations. Attributes like variables_ , imputer_dict_ , and `binner_dict_ are now assigned only as the final step of the method.

  1. Refactored Subclasses to Implement Local fit() - Updated several transformation and scaling classes to implement their own fit() logic instead of calling super().fit() .

  2. Updated Mixins and Shared Logic - Refactored the FitFromDictMixin in mixins.py to defer the assignment of self.variables_ .

  3. Strengthened the Test Suite - addd test_raises_non_fitted_error_when_error_during_fit to the discretisation test suite and updated existing imputer tests. I also updated the MockClass in base transformer tests to reflect the new architecture.

Files changed:-

  1. Imputation : categorical.py, missing_indicator.py , random_sample.py , drop_missing_data.py , arbitrary_number.py .
  2. Discretisation : geometric_width.py , decision_tree.py , arbitrary.py .
  3. Transformation : log.py, yeojohnson.py , power.py , boxcox.py , arcsin.py , arcsinh.py .
  4. Others : mean_normalization.py , cyclical_features.py .

tests

The test cases are passing locally
image

@solegalli solegalli changed the title fix: defer trailing underscore attribute assignment in fit() for imputers and discretisers (closes #586) [ENH] defer trailing underscore attribute assignment in fit() for imputers and discretisers Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @direkkakkar319-ops

Thanks a lot for the quick turnaround!

I wasn't sure if you finished working on this PR, but I had a look anyways.

It's looking great.

We need to add tests for all the modules included in this PR. Currently, we need to add the test to the following:

  • Add test for the transformers in the creation module (like this, we ensure all transformers in creation are now OK)
  • Add test for the transformers in the transformation module
  • Add test for the scaler

I also updated the issue, to keep track of which modules have been updated up to now.

Thank you!

@direkkakkar319-ops
Copy link
Copy Markdown
Author

tests added


  • Added test for the transformers in the creation module
    locally tests cases are passing
image ---- * Add test for the `transformers` in the `transformation` module locally tests are passing for this change image ---- * Add test for the `scaler` locally test casses passed image

locally also the tests were ran they also pass
image

@direkkakkar319-ops
Copy link
Copy Markdown
Author

HI, @solegalli made the 3 tests i was asked to , would appreciate you review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ensure that all attributes with trailing underscore in fit are leared after the entire logic has succeded

2 participants