Skip to content

Fix UpliftRandomForest predict shape mismatch with multiple treatments (#569)#884

Open
jeongyoonlee wants to merge 5 commits intomasterfrom
fix/569-uplift-predict-shape
Open

Fix UpliftRandomForest predict shape mismatch with multiple treatments (#569)#884
jeongyoonlee wants to merge 5 commits intomasterfrom
fix/569-uplift-predict-shape

Conversation

@jeongyoonlee
Copy link
Collaborator

Summary

  • Fixes shape mismatch when calling UpliftRandomForestClassifier.predict() #569
  • Bootstrap sampling can exclude entire treatment groups from a tree's training data, causing individual trees to produce prediction arrays of different widths
  • When summing predictions across trees, this caused ValueError: operands could not be broadcast together with shapes (N,4) (N,3)
  • Added _align_tree_predict() module-level function that maps each tree's predictions to the forest-level class ordering, filling zeros for missing groups
  • Works with joblib parallel execution (module-level function, not a closure)

Test plan

  • pytest tests/test_uplift_trees.py — 23 passed

🤖 Generated with Claude Code

jeongyoonlee and others added 2 commits March 6, 2026 13:54
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#569)

Bootstrap sampling can exclude entire treatment groups from a tree's
training data, causing individual trees to produce prediction arrays
of different widths. When summing predictions across trees, this
causes a ValueError for shape mismatch.

Added _align_tree_predict() that maps each tree's predictions to the
forest-level class ordering, filling zeros for missing treatment
groups. This is a module-level function (not a closure) so it works
with joblib's parallel pickling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 00:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes UpliftRandomForestClassifier.predict() failing with shape-mismatch errors when bootstrap sampling yields trees trained without all treatment groups, by aligning per-tree prediction outputs to the forest-level class ordering.

Changes:

  • Add _align_tree_predict() helper to map each tree’s predictions into the forest’s classes_ layout, filling missing treatment columns with zeros.
  • Use _align_tree_predict() in both parallel (joblib) and non-parallel prediction paths.
  • Update .gitignore entries for Claude-related worktree artifacts.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File Description
causalml/inference/tree/uplift.pyx Adds per-tree prediction alignment to prevent ensemble summation broadcast errors when some trees lack treatment groups.
.gitignore Adjusts ignored Claude/worktree artifact paths (currently with a duplicated entry).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +74 to +76
for tree_idx, cls in enumerate(tree.classes_):
if cls in forest_classes:
forest_idx = forest_classes.index(cls)
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_align_tree_predict does an O(k^2) alignment by repeatedly checking cls in forest_classes and calling forest_classes.index(cls) inside the loop. This can become a noticeable overhead when there are many treatment groups and/or many trees. Consider precomputing a {class_label: forest_idx} mapping once per forest predict call (and passing it into the helper) or building it once per helper invocation, then doing direct lookups; this also makes the logic clearer and avoids repeated linear scans.

Suggested change
for tree_idx, cls in enumerate(tree.classes_):
if cls in forest_classes:
forest_idx = forest_classes.index(cls)
# Precompute mapping from class label to forest index to avoid repeated
# linear scans over forest_classes.
class_to_forest_idx = {cls: idx for idx, cls in enumerate(forest_classes)}
for tree_idx, cls in enumerate(tree.classes_):
forest_idx = class_to_forest_idx.get(cls)
if forest_idx is not None:

Copilot uses AI. Check for mistakes.
raw = tree.predict(X=X)
if len(tree.classes_) == len(forest_classes):
return raw
aligned = np.zeros((raw.shape[0], len(forest_classes)))
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aligned = np.zeros((raw.shape[0], len(forest_classes))) will always default to float64, potentially changing dtype vs the tree output and doing unnecessary upcasting. Consider setting dtype=raw.dtype so the aligned array matches the per-tree prediction dtype.

Suggested change
aligned = np.zeros((raw.shape[0], len(forest_classes)))
aligned = np.zeros((raw.shape[0], len(forest_classes)), dtype=raw.dtype)

Copilot uses AI. Check for mistakes.
Comment on lines +2568 to +2574
if self.n_jobs != 1:
y_pred_ensemble = sum(
Parallel(n_jobs=self.n_jobs, prefer=self.joblib_prefer)
(delayed(tree.predict)(X=X) for tree in self.uplift_forest)
(delayed(_align_tree_predict)(tree, X, self.classes_) for tree in self.uplift_forest)
) / len(self.uplift_forest)
else:
y_pred_ensemble = sum([tree.predict(X=X) for tree in self.uplift_forest]) / len(self.uplift_forest)
y_pred_ensemble = sum([_align_tree_predict(tree, X, self.classes_) for tree in self.uplift_forest]) / len(self.uplift_forest)
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no targeted regression test ensuring that UpliftRandomForestClassifier.predict() works when bootstrap sampling causes at least one tree to be fit without some treatment groups (the scenario from #569). Current tests use large, balanced samples (see tests/const.py), which makes it unlikely for bootstraps to drop a group. Consider adding a small multi-treatment dataset (or a deterministic bootstrap seed) that reproduces the missing-group case and asserts predict() succeeds for both n_jobs=1 and n_jobs>1.

Copilot uses AI. Check for mistakes.
- Use dict for O(1) class-to-index mapping instead of repeated list scans
- Preserve dtype with dtype=raw.dtype in aligned array
- Add test_UpliftRandomForestClassifier_predict_shape_with_sparse_groups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeongyoonlee
Copy link
Collaborator Author

Thanks for the review. All three suggestions have been addressed:

  1. Dict lookup: Replaced repeated list.index() with a {cls: idx} dict for O(1) lookups.
  2. Dtype preservation: Added dtype=raw.dtype to the aligned array.
  3. Regression test: Added test_UpliftRandomForestClassifier_predict_shape_with_sparse_groups using a small dataset (50 control, 5+5 treatment) with min_samples_treatment=0 to force bootstrap sampling to miss groups. Verifies shape, no NaN, and consistency between single-threaded and parallel predictions.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +73 to +78
aligned = np.zeros((raw.shape[0], len(forest_classes)), dtype=raw.dtype)
class_to_forest_idx = {cls: idx for idx, cls in enumerate(forest_classes)}
for tree_idx, cls in enumerate(tree.classes_):
forest_idx = class_to_forest_idx.get(cls)
if forest_idx is not None:
aligned[:, forest_idx] = raw[:, tree_idx]
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class_to_forest_idx is rebuilt for every tree prediction call. Consider constructing this mapping once in UpliftRandomForestClassifier.predict() and passing it (or passing precomputed forest indices for tree.classes_) to reduce overhead when n_estimators or number of treatment groups is large.

Copilot uses AI. Check for mistakes.
- Build class_to_forest_idx dict once in predict() instead of per tree
- Use model.n_jobs instead of parallel_backend for parallel test
- Assert that sparse-group condition actually occurred in test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeongyoonlee
Copy link
Collaborator Author

Addressed the new feedback:

  1. class_to_forest_idx rebuilt per tree: Now constructed once in predict() and passed to _align_tree_predict().

  2. parallel_backend doesn't control n_jobs: Fixed the test to use model.n_jobs = 1 / model.n_jobs = 2 instead of parallel_backend.

  3. Probabilistic test: Added an assertion that at least one fitted tree has fewer classes than the forest, confirming the sparse-group condition is actually exercised.

  4. Zero-fill biasing the ensemble: This is a valid theoretical concern. However, in practice the zero-fill approach is consistent with standard random forest behavior — a tree that never saw a treatment group has no information about that group's uplift, and zero (no effect) is the most conservative default. Using per-class nanmean would require tracking which trees contributed to each class, adding complexity to the parallel aggregation path. The current approach is simple, correct for the crash fix, and matches user expectations. If per-class weighted averaging is desired, it would be better as a separate enhancement with its own benchmarking.

With only 1 sample per minority treatment group out of 102 total,
bootstrap sampling will miss them in most trees, making the test
deterministic regardless of seed or CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeongyoonlee
Copy link
Collaborator Author

Fixed the probabilistic test failure: reduced minority treatment groups from 5 samples each to 1 sample each (out of 102 total). With bootstrap sampling drawing n=102 with replacement, each tree has a ~37% chance of missing a 1-sample group. Across 10 trees and 2 minority groups, the probability of the sparse-group condition not occurring is ~0.01%, making the test effectively deterministic.

@jeongyoonlee jeongyoonlee added the bug Something isn't working label Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

shape mismatch when calling UpliftRandomForestClassifier.predict()

2 participants