Skip to content

Implement Daily Civic Intelligence Refinement Engine#476

Open
RohanExploit wants to merge 3 commits intomainfrom
civic-intelligence-engine-refinement-18289678482842708093
Open

Implement Daily Civic Intelligence Refinement Engine#476
RohanExploit wants to merge 3 commits intomainfrom
civic-intelligence-engine-refinement-18289678482842708093

Conversation

@RohanExploit
Copy link
Owner

@RohanExploit RohanExploit commented Feb 25, 2026

Implements the Daily Civic Intelligence Refinement Engine that runs daily to analyze civic issues, detect trends, and optimize system parameters (severity weights and duplicate detection radius) automatically.

  • Refactored backend/routers/issues.py to use dynamic duplicate search radius from AdaptiveWeights.
  • Fixed a bug in backend/spatial_utils.py where DBSCAN radius was calculated in degrees but passed to a metric expecting radians.
  • Added system test backend/tests/test_civic_intelligence_system.py covering the full daily cycle.
  • Added documentation CIVIC_INTELLIGENCE.md.

PR created automatically by Jules for task 18289678482842708093 started by @RohanExploit


Summary by cubic

Implements a daily engine that analyzes civic issues, learns from admin upgrades, and auto-tunes severity weights and duplicate detection radius. Integrates the adaptive radius into issue creation, fixes DBSCAN units for haversine, and optimizes Render dependencies.

  • New Features

    • Daily cycle updates category multipliers and duplicate_search_radius based on trends and clustering.
    • create_issue uses the adaptive duplicate radius (default 50m).
    • Added a full system test for the daily cycle and refreshed CIVIC_INTELLIGENCE.md.
  • Dependencies

    • Added scikit-learn, pinned numpy<2.0.0, and removed indic-nlp-library for Render.

Written for commit af4ed8f. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Issue deduplication radius now adapts dynamically based on observed data patterns.
  • Bug Fixes

    • Fixed spatial clustering distance handling to improve duplicate detection accuracy.
  • Documentation

    • Rewrote Civic Intelligence Engine docs into a clearer, process-centered daily workflow with expanded data and transparency sections.
  • Tests

    • Added comprehensive end-to-end tests covering daily processing, deduplication, weight updates, and snapshot creation.
  • Chores

    • Added numerical and ML-related libraries to the project requirements.

Copilot AI review requested due to automatic review settings February 25, 2026 18:48
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@netlify
Copy link

netlify bot commented Feb 25, 2026

Deploy Preview for fixmybharat canceled.

Name Link
🔨 Latest commit af4ed8f
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/699f46700b9a7800095479f3

@github-actions
Copy link

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Redesigns CIVIC_INTELLIGENCE.md to a UTC-based daily workflow, introduces AdaptiveWeights-driven dynamic deduplication radius used in issue creation, fixes DBSCAN eps unit conversion (meters→radians) in spatial utilities, adds end-to-end tests for daily cycle, and updates runtime dependencies (numpy, scikit-learn).

Changes

Cohort / File(s) Summary
Documentation
CIVIC_INTELLIGENCE.md
Rewritten to a process-centric, UTC daily workflow; documents core components, trend detection, adaptive weight optimization, deduplication handling, modelWeights.json, and daily snapshot persistence.
Issue Router & Deduplication
backend/routers/issues.py
Replaces hard-coded 50.0m deduplication radius with adaptive_weights.get_duplicate_search_radius() for bounding-box queries and nearby-issues filtering.
Spatial Utilities
backend/spatial_utils.py
DBSCAN haversine eps corrected to radians via eps = eps_meters / R (R = 6371000.0); coordinates remain converted with np.radians before clustering.
Tests
backend/tests/test_civic_intelligence_system.py
New comprehensive integration test: in-memory DB, temp weight/snapshot dirs, loads MOCK_WEIGHTS, creates clustering/escalation scenarios, runs CivicIntelligenceEngine.run_daily_cycle(), asserts snapshots, weight updates, persistence, and increased duplicate radius.
Dependencies
backend/requirements-render.txt
Adds scikit-learn and numpy to requirements.

Sequence Diagram

sequenceDiagram
    participant Client as Issue Creator
    participant Router as Issue Router
    participant Weights as AdaptiveWeights
    participant Spatial as Spatial Utils
    participant DB as Database
    participant Engine as CivicIntelligenceEngine
    participant FS as File System

    Client->>Router: Create issue payload
    Router->>Weights: get_duplicate_search_radius()
    Weights-->>Router: dynamic radius (meters)
    Router->>Spatial: Compute bbox & query nearby within radius
    Spatial->>DB: Query open issues in bbox
    DB-->>Spatial: Nearby issues list
    alt Nearby issues within radius
        Spatial-->>Router: Duplicates detected (closest)
        Router->>DB: Link report to existing issue
    else No duplicates found
        Router->>DB: Create new issue
    end
    Note over Engine: Daily cycle (UTC) runs
    Engine->>DB: Fetch new issues, escalations
    Engine->>Weights: Analyze trends, update weights
    Weights->>FS: Persist modelWeights.json
    Engine->>FS: Write daily snapshot (civic index, metrics)
    FS-->>Engine: Snapshot & weights saved
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

size/l

Poem

🐰 I nibble weights at break of day,

radii stretch where metrics play.
Radians whisper, clusters bloom,
snapshots brighten burrowed room.
Hooray — intelligence hops this way!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: implementing a daily refinement engine for the Civic Intelligence system. It directly reflects the primary objective of the PR.
Description check ✅ Passed The pull request description covers key objectives, implementation details, and includes auto-generated summaries. All major sections from the template are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch civic-intelligence-engine-refinement-18289678482842708093

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a Daily Civic Intelligence Refinement Engine that automatically analyzes civic issues, detects trends, and optimizes system parameters on a daily schedule. The system learns from user behavior to improve duplicate detection accuracy and severity scoring.

Changes:

  • Added system test for the full daily civic intelligence cycle
  • Fixed DBSCAN clustering bug where radius was incorrectly converted from meters to degrees instead of radians
  • Integrated dynamic duplicate detection radius from adaptive weights system into issue creation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

File Description
backend/tests/test_civic_intelligence_system.py New comprehensive system test for the daily refinement cycle
backend/spatial_utils.py Fixed DBSCAN radius conversion from degrees to radians
backend/routers/issues.py Integrated dynamic duplicate search radius from adaptive weights
CIVIC_INTELLIGENCE.md Updated documentation with clearer structure and emoji formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Patch paths to use temp directory
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SessionLocal mock is incorrectly configured. SessionLocal is called as a factory function (SessionLocal()) in civic_intelligence.py line 45, so the mock needs to be callable and return the session. Import MagicMock from unittest.mock, then use: mock_session_local = MagicMock(return_value=db_session), patch("backend.civic_intelligence.SessionLocal", mock_session_local). Currently, the test passes db_session directly as return_value to patch, which will fail when SessionLocal() is called because db_session is a Session object, not a callable.

Copilot uses AI. Check for mistakes.
new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
assert new_pothole_weight > 1.0, f"Pothole weight should increase from 1.0, got {new_pothole_weight}"

# Radius should increase because of clustering (> 5 clusters)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly states "Radius should increase because of clustering (> 5 clusters)" but the test setup creates scattered issues specifically to avoid forming clusters (line 73-74 comments). The radius increase is actually triggered by the "high volume, no clusters" condition (cluster_count == 0 and len(issues_24h) > 50) from civic_intelligence.py line 134-136. The comment should be corrected to reflect the actual test scenario.

Suggested change
# Radius should increase because of clustering (> 5 clusters)
# Radius should increase due to high volume of issues with no clusters (cluster_count == 0 and len(issues_24h) > 50)

Copilot uses AI. Check for mistakes.
from datetime import datetime, timedelta, timezone
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from unittest.mock import patch
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing MagicMock import needed to fix the SessionLocal mocking pattern on line 60. Add MagicMock to the import statement: from unittest.mock import patch, MagicMock

Suggested change
from unittest.mock import patch
from unittest.mock import patch, MagicMock

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +173
# Convert eps from meters to radians
# Haversine metric expects inputs in radians and eps in radians
R = 6371000.0 # Earth's radius in meters
eps_radians = eps_meters / R
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DBSCAN radius conversion fix from degrees to radians lacks a direct unit test. While test_civic_intelligence_system.py implicitly tests clustering behavior, consider adding a unit test in test_spatial_utils.py that verifies cluster_issues_dbscan correctly clusters issues within the specified eps_meters distance. This would ensure the mathematical conversion is correct and prevent regression.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/routers/issues.py (1)

97-125: ⚠️ Potential issue | 🟠 Major

Validate and clamp the adaptive dedup radius before using it.

At Line [98], the radius is consumed directly from adaptive weights. If that value is malformed (e.g., string/NaN/out-of-range), Line [102] or Line [125] can fail or misbehave, and the broad except then silently disables deduplication for that request.

🔧 Suggested hardening
@@
 import uuid
 import os
 import logging
 import hashlib
+import math
 from datetime import datetime, timezone
@@
-            # Get dynamic search radius from adaptive weights
-            search_radius = adaptive_weights.get_duplicate_search_radius()
+            # Get dynamic search radius from adaptive weights
+            raw_radius = adaptive_weights.get_duplicate_search_radius()
+            try:
+                search_radius = float(raw_radius)
+            except (TypeError, ValueError):
+                logger.warning("Invalid duplicate_search_radius=%r; using fallback 50.0m", raw_radius)
+                search_radius = 50.0
+
+            if not math.isfinite(search_radius):
+                logger.warning("Non-finite duplicate_search_radius=%r; using fallback 50.0m", raw_radius)
+                search_radius = 50.0
+
+            # Keep runtime behavior bounded even if config file is edited manually
+            search_radius = max(10.0, min(search_radius, 200.0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/issues.py` around lines 97 - 125, Validate and clamp the
value returned by adaptive_weights.get_duplicate_search_radius() before using
it: ensure it's a finite number (coerce/parse if needed), fall back to a safe
default when non-numeric, NaN, or missing, and clamp to a defined min/max (e.g.,
10–5000 meters) to avoid extreme or invalid radii; then use that sanitized
search_radius in get_bounding_box(...) and find_nearby_issues(...) and log a
warning when you substitute a fallback so the run_in_threadpool/db.query dedup
path doesn't silently disable deduplication.
🧹 Nitpick comments (1)
backend/tests/test_civic_intelligence_system.py (1)

98-98: Use _ for intentionally unused loop variable.

Line [98] uses i but never reads it; this is a straightforward lint cleanup.

♻️ Small cleanup
-        for i in range(4): # 4 upgrades > 3 threshold
+        for _ in range(4): # 4 upgrades > 3 threshold
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_civic_intelligence_system.py` at line 98, The loop in the
test uses an unused loop variable "i" (for i in range(4):) — replace it with the
conventional unused-name "_" (for _ in range(4):) to satisfy linting; update the
occurrence in backend/tests/test_civic_intelligence_system.py where the loop is
present (the for i in range(4): line) so the test behavior remains identical but
the unused variable warning is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/test_civic_intelligence_system.py`:
- Around line 58-60: The test is leaking module-level adaptive_weights state
used by CivicIntelligenceEngine; instead of only patching
backend.adaptive_weights.DATA_FILE, replace or reset the module-level
adaptive_weights object so cached weights can't persist across tests—patch
backend.civic_intelligence.adaptive_weights to a fresh AdaptiveWeights instance
(or call its reset/clear method) before constructing CivicIntelligenceEngine,
ensuring DATA_FILE is still set to weights_file and SessionLocal/snapshot
patches remain in place.

In `@CIVIC_INTELLIGENCE.md`:
- Line 57: The docs list the snapshot field as `weight_changes` but the tests
assert `weight_updates`; update the snapshot schema so both match by renaming
the field in CIVIC_INTELLIGENCE.md from `weight_changes` to `weight_updates` (or
alternatively change the test to expect `weight_changes`), ensuring the field
name in the README and the schema/test (`weight_updates` vs `weight_changes`)
are identical across the codebase.

---

Outside diff comments:
In `@backend/routers/issues.py`:
- Around line 97-125: Validate and clamp the value returned by
adaptive_weights.get_duplicate_search_radius() before using it: ensure it's a
finite number (coerce/parse if needed), fall back to a safe default when
non-numeric, NaN, or missing, and clamp to a defined min/max (e.g., 10–5000
meters) to avoid extreme or invalid radii; then use that sanitized search_radius
in get_bounding_box(...) and find_nearby_issues(...) and log a warning when you
substitute a fallback so the run_in_threadpool/db.query dedup path doesn't
silently disable deduplication.

---

Nitpick comments:
In `@backend/tests/test_civic_intelligence_system.py`:
- Line 98: The loop in the test uses an unused loop variable "i" (for i in
range(4):) — replace it with the conventional unused-name "_" (for _ in
range(4):) to satisfy linting; update the occurrence in
backend/tests/test_civic_intelligence_system.py where the loop is present (the
for i in range(4): line) so the test behavior remains identical but the unused
variable warning is resolved.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39d5fbb and 0466270.

📒 Files selected for processing (4)
  • CIVIC_INTELLIGENCE.md
  • backend/routers/issues.py
  • backend/spatial_utils.py
  • backend/tests/test_civic_intelligence_system.py

Comment on lines +58 to +60
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Isolate the adaptive-weights singleton to avoid cross-test state leakage.

Line [58]-Line [60] patches DATA_FILE, but CivicIntelligenceEngine uses a module-level adaptive_weights object with cached state. That can make this test order-dependent if cached weights survive into later tests.

🧪 Suggested isolation patch
@@
 def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
-    temp_dir, weights_file, snapshots_dir = temp_dirs
+    _, weights_file, snapshots_dir = temp_dirs
+    test_weights = AdaptiveWeights()
@@
-    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
-         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
-         patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
+    with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
+         patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
+         patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
+         patch("backend.civic_intelligence.adaptive_weights", test_weights):
@@
-        weights_system = AdaptiveWeights()
-        weights_system._weights = None # Force reload
-        weights_system._load_weights()
+        test_weights._weights = None  # Force reload from patched DATA_FILE
+        test_weights._last_loaded = 0
+        test_weights._load_weights()
@@
-        assert weights_system.get_category_multipliers()["pothole"] == 1.0
-        assert weights_system.get_duplicate_search_radius() == 50.0
+        assert test_weights.get_category_multipliers()["pothole"] == 1.0
+        assert test_weights.get_duplicate_search_radius() == 50.0
@@
-        weights_system._last_loaded = 0 # Force reload
-        weights_system._load_weights()
+        test_weights._last_loaded = 0  # Force reload
+        test_weights._load_weights()
@@
-        new_pothole_weight = weights_system.get_category_multipliers()["pothole"]
+        new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
@@
-        new_radius = weights_system.get_duplicate_search_radius()
+        new_radius = test_weights.get_duplicate_search_radius()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session):
def test_daily_civic_intelligence_cycle(temp_dirs, db_session):
_, weights_file, snapshots_dir = temp_dirs
test_weights = AdaptiveWeights()
with patch("backend.adaptive_weights.DATA_FILE", weights_file), \
patch("backend.civic_intelligence.SNAPSHOT_DIR", snapshots_dir), \
patch("backend.civic_intelligence.SessionLocal", return_value=db_session), \
patch("backend.civic_intelligence.adaptive_weights", test_weights):
test_weights._weights = None # Force reload from patched DATA_FILE
test_weights._last_loaded = 0
test_weights._load_weights()
assert test_weights.get_category_multipliers()["pothole"] == 1.0
assert test_weights.get_duplicate_search_radius() == 50.0
# Simulate weight update
test_weights._last_loaded = 0 # Force reload
test_weights._load_weights()
new_pothole_weight = test_weights.get_category_multipliers()["pothole"]
new_radius = test_weights.get_duplicate_search_radius()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_civic_intelligence_system.py` around lines 58 - 60, The
test is leaking module-level adaptive_weights state used by
CivicIntelligenceEngine; instead of only patching
backend.adaptive_weights.DATA_FILE, replace or reset the module-level
adaptive_weights object so cached weights can't persist across tests—patch
backend.civic_intelligence.adaptive_weights to a fresh AdaptiveWeights instance
(or call its reset/clear method) before constructing CivicIntelligenceEngine,
ensuring DATA_FILE is still set to weights_file and SessionLocal/snapshot
patches remain in place.

They contain:
* `trends`: Top keywords, category distribution, clusters.
* `civic_index`: The daily score and insights.
* `weight_changes`: Audit log of any automatic weight adjustments.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Snapshot field name is inconsistent with tested payload.

Line [57] documents weight_changes, but backend/tests/test_civic_intelligence_system.py (Line [138]-Line [139]) validates weight_updates. Please align the doc with the actual snapshot schema (or vice versa).

📝 Doc fix
-*   `weight_changes`: Audit log of any automatic weight adjustments.
+*   `weight_updates`: Audit log of any automatic weight adjustments.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `weight_changes`: Audit log of any automatic weight adjustments.
* `weight_updates`: Audit log of any automatic weight adjustments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CIVIC_INTELLIGENCE.md` at line 57, The docs list the snapshot field as
`weight_changes` but the tests assert `weight_updates`; update the snapshot
schema so both match by renaming the field in CIVIC_INTELLIGENCE.md from
`weight_changes` to `weight_updates` (or alternatively change the test to expect
`weight_changes`), ensuring the field name in the README and the schema/test
(`weight_updates` vs `weight_changes`) are identical across the codebase.

@github-actions
Copy link

🔍 Quality Reminder

Thanks for the updates! Please ensure:
- Your changes don't break existing functionality
- All tests still pass
- Code quality standards are maintained

*The maintainers will verify that the overall project flow remains intact.*

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/requirements-render.txt`:
- Around line 23-24: The new requirements entries add unpinned packages; pin
scikit-learn (and numpy if you intend to control it) to specific versions to
ensure reproducible Render deployments: update the requirements-render.txt
entries for "scikit-learn" (and "numpy" if kept) to explicit version pins (e.g.,
scikit-learn==X.Y.Z and numpy==A.B.C) or remove the explicit numpy line if you
want numpy to follow scikit-learn's transitive constraint; modify the lines
referencing scikit-learn and numpy accordingly so they are fixed to chosen
versions.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0466270 and 2e70926.

📒 Files selected for processing (1)
  • backend/requirements-render.txt

Comment on lines 23 to 24
scikit-learn
numpy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the relevant section
cat -n backend/requirements-render.txt

Repository: RohanExploit/VishwaGuru

Length of output: 557


🌐 Web query:

scikit-learn latest version 2025 PyPI

💡 Result:

On PyPI, the latest scikit-learn release in 2025 (and the latest stable version listed) is 1.8.0, published Dec 10, 2025. [1]


🌐 Web query:

numpy latest version 2025 PyPI

💡 Result:

On PyPI, the latest NumPy release during 2025 (i.e., the newest one with a 2025 release date) was NumPy 2.3.5, released November 16, 2025. [1]

(For context: PyPI shows later releases in 2026, e.g. 2.4.2 on January 31, 2026.) [1]

Sources: [1] PyPI “numpy” project page (release history).


🌐 Web query:

scikit-learn numpy transitive dependency

💡 Result:

  • In scikit-learn itself, NumPy is a direct (runtime) dependency, i.e., scikit-learn declares that it requires NumPy (along with SciPy, joblib, threadpoolctl). The scikit-learn docs list NumPy as a required “build, install” dependency (e.g., minimum versions are published in their dependency table). [1]

  • For your project, NumPy is often a transitive dependency if you only depend on scikit-learn. Example: if your project declares scikit-learn but not numpy, then numpy is an indirect requirement that arrives because scikit-learn needs it—so it’s transitive from your project’s perspective. pip explicitly resolves “dependencies of dependencies” during installation. [2]

  • Also note: numpy does not depend on scikit-learn (the dependency direction is scikit-learn → numpy, not the other way around). [1]

Practical checks / control:

  • See the dependency tree in an environment with pipdeptree (e.g., pip install pipdeptree && pipdeptree -p scikit-learn). [3]
  • If you need to pin or restrict a transitive dependency (like NumPy pulled in via scikit-learn/SciPy), pip recommends using a constraints file. [2]

Sources:
[1] scikit-learn docs (install + minimum dependency table)
[2] pip docs (dependency resolution; constraints for transitive deps)
[3] pipdeptree (PyPI description/usage)


🌐 Web query:

scikit-learn CVE security advisories 2025

💡 Result:

scikit-learn security advisories / CVEs in 2025

  • Upstream scikit-learn (the scikit-learn project itself): GitHub’s Security Advisories page for scikit-learn/scikit-learn shows no published security advisories, i.e., no GHSA entries to map to 2025 CVEs/advisories. [1]
  • Known scikit-learn CVEs around that time (but not 2025): Vulnerability trackers for the PyPI package list items such as CVE-2024-5206 (fixed in scikit-learn 1.5.0) and older issues like CVE-2020-28975, but do not show any CVE-2025-… entries for scikit-learn itself. [2]

What you may see in 2025 anyway (downstream/vendor advisories)

  • Linux distros may publish 2025-dated advisories for their python*-scikit-learn packages that reference older CVEs (example: an openSUSE 2025 advisory referencing CVE-2020-28975). [3]

Closely related 2025 advisory (scikit-learn ecosystem)

  • CVE-2025-54886 affects skops (used to share/ship scikit-learn models), enabling arbitrary code execution in certain loading paths; fixed in skops 0.13.0. This is not a scikit-learn CVE, but is relevant if you use skops with scikit-learn models. [4]

Sources: [1] GitHub scikit-learn security advisories page, [2] Safety vulnerability listing for scikit-learn, [3] openSUSE 2025 advisory, [4] CVE-2025-54886 details (Wiz).


🌐 Web query:

numpy CVE security advisories 2025

💡 Result:

NumPy CVE / security advisories in 2025

  • NumPy upstream (project-maintained advisories): As of Feb 25, 2026, the official numpy/numpy GitHub repository shows no published security advisories (so there were none in 2025). [1]

  • CVE entries in 2025 that some distro errata associate with “numpy” updates are typically not NumPy bugs, but ecosystem / bundle updates where the NumPy RPM/wheel gets rebuilt alongside other Python components:

    • CVE-2025-8194 is a CPython tarfile infinite-loop/deadlock issue (not NumPy). [2] Oracle Linux errata (Aug 29, 2025) lists a “python39” security update where numpy is among updated packages while referencing CVE-2025-8194. [3]
    • CVE-2025-47273 is a setuptools path traversal issue (not NumPy). [4] Some third-party vulnerability trackers/distro rollups list numpy among affected/updated packages for this CVE as part of broader module updates. [4]

If you tell me your environment (pip vs conda vs OS packages, and which NumPy version), I can point to the specific advisories/errata that apply to your installation.

Sources: [1] [2] [3] [4]


Pin versions for both new dependencies to ensure reproducible Render deployments.

Both scikit-learn and numpy are added unpinned. Because Render resolves the latest satisfying version on every deploy, a major or breaking release of either package will silently break the next deployment without any code change. This is the same risk as all the other unpinned entries in this file, but it's worth addressing here since these are newly introduced.

Additionally, numpy is a hard transitive dependency of scikit-learn; explicitly listing it without a pin adds noise but no constraint — only worth keeping if you need to enforce a specific numpy version.

📌 Suggested version pinning
-scikit-learn
-numpy
+scikit-learn==1.8.0
+numpy==2.4.2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scikit-learn
numpy
scikit-learn==1.8.0
numpy==2.4.2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/requirements-render.txt` around lines 23 - 24, The new requirements
entries add unpinned packages; pin scikit-learn (and numpy if you intend to
control it) to specific versions to ensure reproducible Render deployments:
update the requirements-render.txt entries for "scikit-learn" (and "numpy" if
kept) to explicit version pins (e.g., scikit-learn==X.Y.Z and numpy==A.B.C) or
remove the explicit numpy line if you want numpy to follow scikit-learn's
transitive constraint; modify the lines referencing scikit-learn and numpy
accordingly so they are fixed to chosen versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants