Fix UpliftRandomForest predict shape mismatch with multiple treatments (#569) by jeongyoonlee · Pull Request #884 · uber/causalml

jeongyoonlee · 2026-03-07T00:15:11Z

Summary

Fixes shape mismatch when calling UpliftRandomForestClassifier.predict() #569
Bootstrap sampling can exclude entire treatment groups from a tree's training data, causing individual trees to produce prediction arrays of different widths
When summing predictions across trees, this caused ValueError: operands could not be broadcast together with shapes (N,4) (N,3)
Added _align_tree_predict() module-level function that maps each tree's predictions to the forest-level class ordering, filling zeros for missing groups
Works with joblib parallel execution (module-level function, not a closure)

Test plan

pytest tests/test_uplift_trees.py — 23 passed

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

#569) Bootstrap sampling can exclude entire treatment groups from a tree's training data, causing individual trees to produce prediction arrays of different widths. When summing predictions across trees, this causes a ValueError for shape mismatch. Added _align_tree_predict() that maps each tree's predictions to the forest-level class ordering, filling zeros for missing treatment groups. This is a module-level function (not a closure) so it works with joblib's parallel pickling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes UpliftRandomForestClassifier.predict() failing with shape-mismatch errors when bootstrap sampling yields trees trained without all treatment groups, by aligning per-tree prediction outputs to the forest-level class ordering.

Changes:

Add _align_tree_predict() helper to map each tree’s predictions into the forest’s classes_ layout, filling missing treatment columns with zeros.
Use _align_tree_predict() in both parallel (joblib) and non-parallel prediction paths.
Update .gitignore entries for Claude-related worktree artifacts.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`causalml/inference/tree/uplift.pyx`	Adds per-tree prediction alignment to prevent ensemble summation broadcast errors when some trees lack treatment groups.
`.gitignore`	Adjusts ignored Claude/worktree artifact paths (currently with a duplicated entry).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-07T00:18:12Z

causalml/inference/tree/uplift.pyx

+    for tree_idx, cls in enumerate(tree.classes_):
+        if cls in forest_classes:
+            forest_idx = forest_classes.index(cls)


_align_tree_predict does an O(k^2) alignment by repeatedly checking cls in forest_classes and calling forest_classes.index(cls) inside the loop. This can become a noticeable overhead when there are many treatment groups and/or many trees. Consider precomputing a {class_label: forest_idx} mapping once per forest predict call (and passing it into the helper) or building it once per helper invocation, then doing direct lookups; this also makes the logic clearer and avoids repeated linear scans.

Suggested change

for tree_idx, cls in enumerate(tree.classes_):

if cls in forest_classes:

forest_idx = forest_classes.index(cls)

# Precompute mapping from class label to forest index to avoid repeated

# linear scans over forest_classes.

class_to_forest_idx = {cls: idx for idx, cls in enumerate(forest_classes)}

for tree_idx, cls in enumerate(tree.classes_):

forest_idx = class_to_forest_idx.get(cls)

if forest_idx is not None:

Copilot · 2026-03-07T00:18:13Z

causalml/inference/tree/uplift.pyx

+    raw = tree.predict(X=X)
+    if len(tree.classes_) == len(forest_classes):
+        return raw
+    aligned = np.zeros((raw.shape[0], len(forest_classes)))


aligned = np.zeros((raw.shape[0], len(forest_classes))) will always default to float64, potentially changing dtype vs the tree output and doing unnecessary upcasting. Consider setting dtype=raw.dtype so the aligned array matches the per-tree prediction dtype.

Suggested change

aligned = np.zeros((raw.shape[0], len(forest_classes)))

aligned = np.zeros((raw.shape[0], len(forest_classes)), dtype=raw.dtype)

Copilot · 2026-03-07T00:18:13Z

causalml/inference/tree/uplift.pyx

        if self.n_jobs != 1:
            y_pred_ensemble = sum(
                Parallel(n_jobs=self.n_jobs, prefer=self.joblib_prefer)
-                (delayed(tree.predict)(X=X) for tree in self.uplift_forest)
+                (delayed(_align_tree_predict)(tree, X, self.classes_) for tree in self.uplift_forest)
            ) / len(self.uplift_forest)
        else:
-            y_pred_ensemble = sum([tree.predict(X=X) for tree in self.uplift_forest]) / len(self.uplift_forest)
+            y_pred_ensemble = sum([_align_tree_predict(tree, X, self.classes_) for tree in self.uplift_forest]) / len(self.uplift_forest)


There’s no targeted regression test ensuring that UpliftRandomForestClassifier.predict() works when bootstrap sampling causes at least one tree to be fit without some treatment groups (the scenario from #569). Current tests use large, balanced samples (see tests/const.py), which makes it unlikely for bootstraps to drop a group. Consider adding a small multi-treatment dataset (or a deterministic bootstrap seed) that reproduces the missing-group case and asserts predict() succeeds for both n_jobs=1 and n_jobs>1.

- Use dict for O(1) class-to-index mapping instead of repeated list scans - Preserve dtype with dtype=raw.dtype in aligned array - Add test_UpliftRandomForestClassifier_predict_shape_with_sparse_groups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jeongyoonlee · 2026-03-07T00:32:34Z

Thanks for the review. All three suggestions have been addressed:

Dict lookup: Replaced repeated list.index() with a {cls: idx} dict for O(1) lookups.
Dtype preservation: Added dtype=raw.dtype to the aligned array.
Regression test: Added test_UpliftRandomForestClassifier_predict_shape_with_sparse_groups using a small dataset (50 control, 5+5 treatment) with min_samples_treatment=0 to force bootstrap sampling to miss groups. Verifies shape, no NaN, and consistency between single-threaded and parallel predictions.

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

tests/test_uplift_trees.py

causalml/inference/tree/uplift.pyx

Copilot · 2026-03-07T00:37:07Z

causalml/inference/tree/uplift.pyx

+    aligned = np.zeros((raw.shape[0], len(forest_classes)), dtype=raw.dtype)
+    class_to_forest_idx = {cls: idx for idx, cls in enumerate(forest_classes)}
+    for tree_idx, cls in enumerate(tree.classes_):
+        forest_idx = class_to_forest_idx.get(cls)
+        if forest_idx is not None:
+            aligned[:, forest_idx] = raw[:, tree_idx]


class_to_forest_idx is rebuilt for every tree prediction call. Consider constructing this mapping once in UpliftRandomForestClassifier.predict() and passing it (or passing precomputed forest indices for tree.classes_) to reduce overhead when n_estimators or number of treatment groups is large.

tests/test_uplift_trees.py

- Build class_to_forest_idx dict once in predict() instead of per tree - Use model.n_jobs instead of parallel_backend for parallel test - Assert that sparse-group condition actually occurred in test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jeongyoonlee · 2026-03-07T00:43:29Z

Addressed the new feedback:

class_to_forest_idx rebuilt per tree: Now constructed once in predict() and passed to _align_tree_predict().
parallel_backend doesn't control n_jobs: Fixed the test to use model.n_jobs = 1 / model.n_jobs = 2 instead of parallel_backend.
Probabilistic test: Added an assertion that at least one fitted tree has fewer classes than the forest, confirming the sparse-group condition is actually exercised.
Zero-fill biasing the ensemble: This is a valid theoretical concern. However, in practice the zero-fill approach is consistent with standard random forest behavior — a tree that never saw a treatment group has no information about that group's uplift, and zero (no effect) is the most conservative default. Using per-class nanmean would require tracking which trees contributed to each class, adding complexity to the parallel aggregation path. The current approach is simple, correct for the crash fix, and matches user expectations. If per-class weighted averaging is desired, it would be better as a separate enhancement with its own benchmarking.

With only 1 sample per minority treatment group out of 102 total, bootstrap sampling will miss them in most trees, making the test deterministic regardless of seed or CI environment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jeongyoonlee · 2026-03-07T02:22:46Z

Fixed the probabilistic test failure: reduced minority treatment groups from 5 samples each to 1 sample each (out of 102 total). With bootstrap sampling drawing n=102 with replacement, each tree has a ~37% chance of missing a 1-sample group. Across 10 trees and 2 minority groups, the probability of the sparse-group condition not occurring is ~0.01%, making the test effectively deterministic.

jeongyoonlee and others added 2 commits March 6, 2026 13:54

Add .worktrees/ to .gitignore

d688549

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 7, 2026 00:15

Copilot started reviewing on behalf of jeongyoonlee March 7, 2026 00:15 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

jeongyoonlee requested review from alexander-pv, Copilot, huigangchen, paullo0106, ppstacy, ras44, t-tte, vincewu51 and zhenyuz0500 March 7, 2026 00:32

Copilot started reviewing on behalf of jeongyoonlee March 7, 2026 00:33 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

jeongyoonlee added the bug Something isn't working label Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UpliftRandomForest predict shape mismatch with multiple treatments (#569)#884

Fix UpliftRandomForest predict shape mismatch with multiple treatments (#569)#884
jeongyoonlee wants to merge 5 commits intomasterfrom
fix/569-uplift-predict-shape

jeongyoonlee commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Uh oh!

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	aligned = np.zeros((raw.shape[0], len(forest_classes)))
	aligned = np.zeros((raw.shape[0], len(forest_classes)), dtype=raw.dtype)

Conversation

jeongyoonlee commented Mar 7, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

jeongyoonlee commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants