Feature request: sample weights support in RandomForest (and other tree-based models)

## Feature Request

Support for `sample_weights: Vec<f64>` in `RandomForestRegressor::fit()` (and ideally `RandomForestClassifier` and the underlying `DecisionTree` models as well).

## Use Case

I'm training a RandomForest on time-series data where recent observations should be weighted more heavily than older ones (exponential decay: `weight = 0.9^months_ago`). This is a common pattern in scikit-learn:

```python
model.fit(X, y, sample_weight=weights)
```

Without sample weights, there's no way to express "this training example matters more than that one" — which is important for recency weighting, class imbalance correction, and importance sampling.

## Current State

Looking at the source code, the internal plumbing is close to supporting this:

- `BaseForestRegressor::sample_with_replacement()` does uniform bootstrap sampling — this could be extended to weighted sampling
- `BaseTreeRegressor::fit_weak_learner()` already accepts `samples: Vec<usize>` (bootstrap counts) and uses them as integer multipliers in split statistics:
  ```rust
  sum += *sample_i as f64 * y_m.get(i).to_f64().unwrap();
  ```
- Generalizing `samples` from `Vec<usize>` (integer counts) to `Vec<f64>` (continuous weights) in the tree splitter would enable this

## Proposed API

Option A — Add to parameters struct:
```rust
RandomForestRegressorParameters {
    // ... existing fields ...
    sample_weights: Option<Vec<f64>>,
}
```

Option B — Extend the fit signature (breaking change):
```rust
pub fn fit(x: &X, y: &Y, parameters: P, sample_weights: Option<&[f64]>) -> Result<Self, Failed>
```

Option A is backwards-compatible and probably preferable.

## Scope

Two pieces:
1. **Weighted bootstrap sampling** in `BaseForestRegressor` — sample with probability proportional to weights instead of uniformly
2. **Weighted split statistics** in `BaseTreeRegressor` — use float weights instead of integer counts when computing mean/variance for split criteria

## scikit-learn Reference

For reference, scikit-learn's implementation:
- Passes `sample_weight` through to each tree's `fit()` 
- Uses weights in bootstrap sampling (weighted random draw with replacement)
- Uses weights in impurity calculations (weighted mean, weighted variance)
- Docs: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor.fit

This is one of the most commonly used features in scikit-learn's RandomForest and would make smartcore a much more viable alternative for real-world ML pipelines.

Thank you for maintaining this crate — the WASM-first posture is exactly what drew me to it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: sample weights support in RandomForest (and other tree-based models) #356

Feature Request

Use Case

Current State

Proposed API

Scope

scikit-learn Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: sample weights support in RandomForest (and other tree-based models) #356

Description

Feature Request

Use Case

Current State

Proposed API

Scope

scikit-learn Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions