Skip to content

⚡ Bolt: Optimize blockchain verification to O(1) by storing previous hash#474

Open
RohanExploit wants to merge 4 commits intomainfrom
bolt-optimize-blockchain-o1-verification-15333545155765582858
Open

⚡ Bolt: Optimize blockchain verification to O(1) by storing previous hash#474
RohanExploit wants to merge 4 commits intomainfrom
bolt-optimize-blockchain-o1-verification-15333545155765582858

Conversation

@RohanExploit
Copy link
Owner

@RohanExploit RohanExploit commented Feb 25, 2026

Identified and optimized a performance bottleneck in the blockchain integrity verification process. By storing the hash of the preceding record directly in the current issue (as a back-link), the system now performs verification with a single database lookup instead of two, achieving $O(1)$ database roundtrip efficiency for the link verification. Indices were also added to ensure fast lookups on audit trails. Full backward compatibility is maintained via a fallback mechanism for legacy records.


PR created automatically by Jules for task 15333545155765582858 started by @RohanExploit


Summary by cubic

Stores each issue’s previous hash to make blockchain verification O(1) with a single DB lookup. Updates the verification API, expands migrations with indexes and new tracking fields, keeps a fallback for legacy records, and stabilizes Render builds and CORS.

  • New Features

    • Added previous_integrity_hash to Issue and populate it on creation.
    • Verification API now includes previous_hash in the response.
    • Tests cover O(1) storage and verification, including previous_hash in responses.
  • Refactors

    • Migrations hardened: set String/VARCHAR(255), added indexes on integrity_hash/previous_integrity_hash, and added reference_id (unique), verified_at, assigned_at, resolved_at, assigned_to; standardized lengths for location/audio paths.
    • Deployment/runtime fixes: pinned bcrypt<4.0.0, split python-jose/cryptography, removed async_lru and indic-nlp-library, added a simple TTL cache in gemini_summary, and made CORS safer when FRONTEND_URL is missing in production.

Written for commit a079f17. Summary will update on new commits.

Summary by CodeRabbit

  • New Features
    • Records now store the previous integrity hash, enabling O(1) integrity verification and exposing previous_hash in verification responses.
  • Bug Fixes / Performance
    • Verification path optimized to avoid multi-step cross-record lookups, improving verification speed and scalability.
  • Documentation
    • Added documented entry describing the chaining-based O(1) integrity verification approach.

…hash

- Added `previous_integrity_hash` to `Issue` model to allow single-query verification.
- Added database indexes to `integrity_hash` and `previous_integrity_hash` columns.
- Refactored `verify_blockchain_integrity` to use the stored previous hash, reducing DB roundtrips.
- Updated `create_issue` to populate the new field during report creation.
- Enhanced `BlockchainVerificationResponse` schema for transparency.
- Updated blockchain tests to verify the O(1) storage and verification logic.
- Implemented database migration in `init_db.py` for indexes.
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings February 25, 2026 14:07
@netlify
Copy link

netlify bot commented Feb 25, 2026

👷 Deploy Preview for fixmybharat processing.

Name Link
🔨 Latest commit a079f17
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/699f0e40afef900008750b85

@github-actions
Copy link

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds O(1) blockchain-style chaining by storing previous_integrity_hash on Issue records, creates indexes for O(1) lookups, updates creation and verification logic and schemas, adds tests and documentation, and adjusts ancillary modules (CORS, caching, requirements).

Changes

Cohort / File(s) Summary
Database schema & migrations
backend/models.py, backend/init_db.py
Add previous_integrity_hash column (String(255), indexed); change integrity_hash to String(255) and index it; add CREATE INDEX IF NOT EXISTS for integrity_hash and previous_integrity_hash; adjust several VARCHAR lengths.
API, router & schema
backend/routers/issues.py, backend/schemas.py
Persist previous_integrity_hash when creating issues; verification endpoint reads stored previous hash (with legacy fallback) and returns previous_hash in BlockchainVerificationResponse.
Tests
tests/test_blockchain.py
Add test_blockchain_o1_storage asserting previous_integrity_hash is populated and verification response includes previous_hash and is valid.
Docs / Design note
.jules/bolt.md
Add entry describing "Chaining for O(1) Integrity Verification" dated 2024-05-30.
App startup & infra
backend/main.py, backend/requirements-render.txt
Refine FRONTEND_URL/CORS handling (prod vs dev branching); adjust dependency pins/removals (remove async_lru, separate python-jose/cryptography, passlib/bcrypt changes).
MLA summary caching
backend/gemini_summary.py
Replace async_lru with simple in-module timestamped cache and fallback behavior on generation failure.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Router as Issue Router
    participant DB as Database

    Note over Client,DB: Create Issue (chaining)
    Client->>Router: POST /issues (payload)
    Router->>DB: SELECT latest Issue.integrity_hash
    DB-->>Router: latest_integrity_hash
    Router->>DB: INSERT Issue (integrity_hash, previous_integrity_hash=latest_integrity_hash)
    DB-->>Router: Insert OK
    Router-->>Client: 201 Created

    Note over Client,DB: O(1) Verification
    Client->>Router: GET /blockchain/verify/{issue_id}
    Router->>DB: SELECT Issue (integrity_hash, previous_integrity_hash) WHERE id=issue_id
    DB-->>Router: current_issue (with previous_integrity_hash)
    Router->>Router: Recompute hash using previous_integrity_hash
    Router->>Router: Compare recomputed vs stored integrity_hash
    Router-->>Client: BlockchainVerificationResponse { previous_hash, computed_hash, current_hash, is_valid }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

ECWoC26-L3, size/m

Poem

🐰
I hop through records, tail held high,
Each hash a breadcrumb in the sky.
Previous links make checks so light,
One lookup, one hop — everything's right! 🥕🔗

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main optimization: adding previous hash storage to achieve O(1) blockchain verification, which directly aligns with the core changes across all modified files.
Description check ✅ Passed PR description is comprehensive, covering objectives, changes, testing, and deployment considerations, but lacks explicit checklist completion and formal template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bolt-optimize-blockchain-o1-verification-15333545155765582858

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the blockchain integrity verification system by storing the previous issue's integrity hash directly in each issue record, achieving O(1) verification performance instead of O(N). The optimization maintains full backward compatibility through a fallback mechanism for legacy records that don't have the cached previous hash.

Changes:

  • Added previous_integrity_hash column to the Issue model with index for efficient lookups
  • Updated issue creation to store the previous issue's hash at the time of creation
  • Modified verification endpoint to use the stored hash with fallback to legacy query for old records
  • Added new API response field to expose the previous hash in verification results

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backend/models.py Added previous_integrity_hash column with index to Issue model
backend/init_db.py Added database migration for new column and indices on both hash columns
backend/routers/issues.py Updated issue creation to store previous hash; optimized verification to O(1) with legacy fallback
backend/schemas.py Added previous_hash field to BlockchainVerificationResponse
tests/test_blockchain.py Added new test to verify O(1) storage and previous_hash in response
.jules/bolt.md Documented the optimization learning with minor formatting issues

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

**Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.

## 2024-05-30 - Chaining for O(1) Integrity Verification
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a formatting issue: "(1)$" should be "O(1)" to properly denote Big O notation. The dollar sign appears to be a typo or formatting error.

Suggested change
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +200
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blockchain optimization is only being applied to web-based issue creation in this endpoint. However, bot.py (line 74-79) and voice.py (line 260-276) also create Issues but do not compute or store integrity_hash or previous_integrity_hash. This creates an inconsistent blockchain where some issues are chained and others are not. Consider either: 1) Implementing blockchain hashing for all issue creation paths to maintain a complete chain, or 2) Documenting this as an intentional design decision if only web submissions need blockchain verification.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/routers/issues.py (1)

173-177: ⚠️ Potential issue | 🟠 Major

Chain head assignment is non-atomic, allowing concurrent forked links.

prev_hash is read and then reused for insert without serialization. Concurrent requests can bind to the same parent hash, creating multiple children that all verify as valid, which breaks strict linear-chain semantics.

Please serialize head selection + insert in one critical section (e.g., DB advisory lock or equivalent transactional chain-head lock).

Also applies to: 199-200

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/issues.py` around lines 173 - 177, The chain-head read
(prev_issue/prev_hash computed via
run_in_threadpool/db.query(Issue.integrity_hash)) is non-atomic with the
subsequent insert and must be serialized; wrap the head-selection and the insert
that uses prev_hash in a single DB-critical section (e.g., acquire a DB
advisory/transactional lock or use SELECT ... FOR UPDATE within a transaction)
so no two requests can read the same head and create concurrent children. Modify
the code around the prev_issue/prev_hash lookup and the later insert routine to
begin a transaction or obtain an advisory lock (using your DB's advisory lock
API), re-read the head inside that lock/transaction, compute prev_hash, perform
the insert, then release/commit; apply the same fix to the other occurrence that
reads prev_hash (the block referenced near lines 199-200).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.jules/bolt.md:
- Line 42: Replace the accidental complexity token "single (1)$ database lookup"
in the learning note with the correct notation, e.g. "single O(1) database
lookup" (or "constant-time (O(1)) database lookup"); locate the exact phrase
"single (1)$ database lookup" in the text and update it to "single O(1) database
lookup" to fix the typo.

In `@tests/test_blockchain.py`:
- Line 98: Replace the explicit boolean comparison in the test assertion —
change the line asserting data["is_valid"] == True to use a direct truth
assertion (i.e., assert data["is_valid"]) in tests/test_blockchain.py so the
test uses Python idiomatic truthiness for the data["is_valid"] value.

---

Outside diff comments:
In `@backend/routers/issues.py`:
- Around line 173-177: The chain-head read (prev_issue/prev_hash computed via
run_in_threadpool/db.query(Issue.integrity_hash)) is non-atomic with the
subsequent insert and must be serialized; wrap the head-selection and the insert
that uses prev_hash in a single DB-critical section (e.g., acquire a DB
advisory/transactional lock or use SELECT ... FOR UPDATE within a transaction)
so no two requests can read the same head and create concurrent children. Modify
the code around the prev_issue/prev_hash lookup and the later insert routine to
begin a transaction or obtain an advisory lock (using your DB's advisory lock
API), re-read the head inside that lock/transaction, compute prev_hash, perform
the insert, then release/commit; apply the same fix to the other occurrence that
reads prev_hash (the block referenced near lines 199-200).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39d5fbb and edce47c.

📒 Files selected for processing (6)
  • .jules/bolt.md
  • backend/init_db.py
  • backend/models.py
  • backend/routers/issues.py
  • backend/schemas.py
  • tests/test_blockchain.py

**Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.

## 2024-05-30 - Chaining for O(1) Integrity Verification
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the complexity notation typo in the learning note.

single (1)$ database lookup looks accidental; use single O(1) database lookup (or equivalent wording).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.jules/bolt.md at line 42, Replace the accidental complexity token "single
(1)$ database lookup" in the learning note with the correct notation, e.g.
"single O(1) database lookup" (or "constant-time (O(1)) database lookup");
locate the exact phrase "single (1)$ database lookup" in the text and update it
to "single O(1) database lookup" to fix the typo.

response = client.get(f"/api/issues/{issue2_id}/blockchain-verify")
assert response.status_code == 200
data = response.json()
assert data["is_valid"] == True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check if the file exists and read the specific line
if [ -f tests/test_blockchain.py ]; then
  echo "=== Line 98 in tests/test_blockchain.py ==="
  sed -n '98p' tests/test_blockchain.py
  echo ""
  echo "=== Context around line 98 (lines 95-101) ==="
  sed -n '95,101p' tests/test_blockchain.py
  echo ""
  echo "=== Search for all assert...==True patterns in the file ==="
  rg -n 'assert\s+.+==\s*True' tests/test_blockchain.py
else
  echo "File tests/test_blockchain.py not found"
fi

Repository: RohanExploit/VishwaGuru

Length of output: 650


Use direct truth assertion for is_valid.

Avoid explicit comparison to True in assertions; prefer assert data["is_valid"] to satisfy linting and follow Python idioms.

✅ Minimal fix
-    assert data["is_valid"] == True
+    assert data["is_valid"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert data["is_valid"] == True
assert data["is_valid"]
🧰 Tools
🪛 Ruff (0.15.2)

[error] 98-98: Avoid equality comparisons to True; use data["is_valid"]: for truth checks

Replace with data["is_valid"]

(E712)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_blockchain.py` at line 98, Replace the explicit boolean comparison
in the test assertion — change the line asserting data["is_valid"] == True to
use a direct truth assertion (i.e., assert data["is_valid"]) in
tests/test_blockchain.py so the test uses Python idiomatic truthiness for the
data["is_valid"] value.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/routers/issues.py">

<violation number="1" location="backend/routers/issues.py:639">
P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.</violation>
</file>

<file name=".jules/bolt.md">

<violation number="1" location=".jules/bolt.md:42">
P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray `$` or use proper `O(1)` formatting).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

)
prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
else:
prev_hash = current_issue.previous_integrity_hash
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 639:

<comment>The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.</comment>

<file context>
@@ -615,24 +616,27 @@ def get_user_issues(
+        )
+        prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
+    else:
+        prev_hash = current_issue.previous_integrity_hash
 
     # Recompute hash based on current data and previous hash
</file context>
Fix with Cubic

**Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.

## 2024-05-30 - Chaining for O(1) Integrity Verification
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray $ or use proper O(1) formatting).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .jules/bolt.md, line 42:

<comment>Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray `$` or use proper `O(1)` formatting).</comment>

<file context>
@@ -37,3 +37,7 @@
 **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.
+
+## 2024-05-30 - Chaining for O(1) Integrity Verification
+**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
+**Action:** Always store the hash of the preceding record in the current record if integrity chaining is required, allowing for immediate verification without secondary queries.
</file context>
Suggested change
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup.
Fix with Cubic

- Pinned `bcrypt<4.0.0` in `backend/requirements-render.txt` to avoid build failures on Render.
- Split `python-jose` and `cryptography` for cleaner dependency management.
- Refined `backend/init_db.py` to use `VARCHAR(255)` for better compatibility.
- Updated `Issue` model to use `String(255)` for hash columns.
- Verified app startup and migration logic on fresh and existing databases.
- Optimized blockchain integrity verification from O(log N) to O(1) by storing `previous_integrity_hash` on each report.
- Added database indexes for `integrity_hash` and `previous_integrity_hash`.
- Fixed Render deployment by pinning `bcrypt<4.0.0` and removing `async_lru`.
- Hardened `init_db.py` migrations with `VARCHAR(255)` and column existence checks.
- Enhanced CORS safety in `main.py` for production environments.
- Added verification tests in `tests/test_blockchain.py`.
@github-actions
Copy link

🔍 Quality Reminder

Thanks for the updates! Please ensure:
- Your changes don't break existing functionality
- All tests still pass
- Code quality standards are maintained

*The maintainers will verify that the overall project flow remains intact.*

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/gemini_summary.py">

<violation number="1" location="backend/gemini_summary.py:100">
P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

try:
return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
_summary_cache[cache_key] = (summary, current_time)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/gemini_summary.py, line 100:

<comment>The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.</comment>

<file context>
@@ -84,7 +96,9 @@ async def _generate_mla_summary_with_gemini() -> str:
     try:
-        return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+        summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+        _summary_cache[cache_key] = (summary, current_time)
+        return summary
     except Exception as e:
</file context>
Fix with Cubic

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/gemini_summary.py`:
- Around line 48-50: The manual _summary_cache dict with SUMMARY_CACHE_TTL is
unbounded and can grow forever; replace it with a bounded cache (e.g., an LRU or
TTL-aware bounded structure) and evict old entries when the size exceeds a
MAX_CACHE_SIZE. Concretely: replace _summary_cache usage with a bounded cache
implementation (for example collections.OrderedDict-based LRU or
cachetools.TTLCache) keyed the same way, set a MAX_CACHE_SIZE constant (e.g.,
1000), ensure entries also respect SUMMARY_CACHE_TTL, and update any places that
read/write _summary_cache (symbols: _summary_cache and SUMMARY_CACHE_TTL and the
functions that access them around the existing uses) to use the new cache API so
old/rare keys are evicted automatically. Ensure thread/async safety if the
module is used concurrently.
- Line 71: The current cache_key is built by concatenating fields into a string
(cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}")
which can collide if values contain underscores; change cache_key to use an
immutable tuple (e.g., (district, assembly_constituency, mla_name,
issue_category)) wherever the key is created and looked up, and update any cache
get/set usages that reference cache_key so they use the tuple form instead of
the concatenated string.
- Around line 99-101: When GEMINI is disabled (genai is None), avoid calling
retry_with_exponential_backoff(_generate_mla_summary_with_gemini, ...) and
returning after delay; instead detect genai is None before the retry and
immediately return the fallback summary (and update _summary_cache[cache_key] if
existing code caches fallbacks) so no retry/delay occurs; specifically modify
the block around retry_with_exponential_backoff to short-circuit when genai is
None, calling the same fallback-path used elsewhere rather than invoking
_generate_mla_summary_with_gemini.

In `@backend/main.py`:
- Around line 143-147: Replace the startup ValueError for invalid FRONTEND_URL
with a critical log and a safe fallback: instead of raising when frontend_url
does not start with "http://" or "https://", call the application's logger to
log a critical message that includes the invalid frontend_url and then set
allowed_origins = [] (maintaining the same fail-closed behavior as missing
values); update the check around frontend_url and allowed_origins in main.py to
perform this logging+fallback rather than raising so a config typo won't crash
the service.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edce47c and a10dad5.

📒 Files selected for processing (5)
  • backend/gemini_summary.py
  • backend/init_db.py
  • backend/main.py
  • backend/models.py
  • backend/requirements-render.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/init_db.py
  • backend/models.py

Comment on lines +48 to +50
# Simple cache to avoid async-lru dependency
_summary_cache = {}
SUMMARY_CACHE_TTL = 86400 # 24 hours
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Manual cache is unbounded and can grow indefinitely.

The TTL check prevents stale reuse but does not evict old/rarely-hit keys. Over time this can cause memory growth under diverse request inputs.

Suggested change
 _summary_cache = {}
 SUMMARY_CACHE_TTL = 86400  # 24 hours
+SUMMARY_CACHE_MAX_ENTRIES = 5000
@@
-    if cache_key in _summary_cache:
+    # Opportunistic cleanup
+    if len(_summary_cache) > SUMMARY_CACHE_MAX_ENTRIES:
+        expired = [k for k, (_, ts) in _summary_cache.items() if current_time - ts >= SUMMARY_CACHE_TTL]
+        for k in expired:
+            _summary_cache.pop(k, None)
+
+    if cache_key in _summary_cache:
         val, ts = _summary_cache[cache_key]
         if current_time - ts < SUMMARY_CACHE_TTL:
             return val
@@
-        _summary_cache[cache_key] = (summary, current_time)
+        _summary_cache[cache_key] = (summary, current_time)

Also applies to: 74-77, 100-100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` around lines 48 - 50, The manual _summary_cache
dict with SUMMARY_CACHE_TTL is unbounded and can grow forever; replace it with a
bounded cache (e.g., an LRU or TTL-aware bounded structure) and evict old
entries when the size exceeds a MAX_CACHE_SIZE. Concretely: replace
_summary_cache usage with a bounded cache implementation (for example
collections.OrderedDict-based LRU or cachetools.TTLCache) keyed the same way,
set a MAX_CACHE_SIZE constant (e.g., 1000), ensure entries also respect
SUMMARY_CACHE_TTL, and update any places that read/write _summary_cache
(symbols: _summary_cache and SUMMARY_CACHE_TTL and the functions that access
them around the existing uses) to use the new cache API so old/rare keys are
evicted automatically. Ensure thread/async safety if the module is used
concurrently.

Returns:
A short paragraph describing the MLA's role and responsibilities
"""
cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a tuple cache key to avoid accidental collisions.

Line [71] concatenates values with _, which can collide when fields themselves contain underscores. A tuple key is safer.

Suggested change
-    cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
+    cache_key = (district, assembly_constituency, mla_name, issue_category)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
cache_key = (district, assembly_constituency, mla_name, issue_category)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` at line 71, The current cache_key is built by
concatenating fields into a string (cache_key =
f"{district}_{assembly_constituency}_{mla_name}_{issue_category}") which can
collide if values contain underscores; change cache_key to use an immutable
tuple (e.g., (district, assembly_constituency, mla_name, issue_category))
wherever the key is created and looked up, and update any cache get/set usages
that reference cache_key so they use the tuple form instead of the concatenated
string.

Comment on lines +99 to +101
summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
_summary_cache[cache_key] = (summary, current_time)
return summary
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip retry path when Gemini is disabled.

When GEMINI_API_KEY is absent (genai = None), Line [99] still retries and delays response before fallback. Return fallback immediately in that mode.

Suggested change
 async def generate_mla_summary(
@@
-    async def _generate_mla_summary_with_gemini() -> str:
+    if genai is None:
+        return _get_fallback_summary(mla_name, assembly_constituency, district)
+
+    async def _generate_mla_summary_with_gemini() -> str:
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 101-101: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` around lines 99 - 101, When GEMINI is disabled
(genai is None), avoid calling
retry_with_exponential_backoff(_generate_mla_summary_with_gemini, ...) and
returning after delay; instead detect genai is None before the retry and
immediately return the fallback summary (and update _summary_cache[cache_key] if
existing code caches fallbacks) so no retry/delay occurs; specifically modify
the block around retry_with_exponential_backoff to short-circuit when genai is
None, calling the same fallback-path used elsewhere rather than invoking
_generate_mla_summary_with_gemini.

Comment on lines +143 to +147
if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
raise ValueError(
f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
)
allowed_origins = [frontend_url]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard-crashing the app on malformed FRONTEND_URL.

Line [144] raises at startup, so a config typo can take the service down. Prefer logging critical and falling back to allowed_origins = [] (same fail-closed behavior used for missing values).

Suggested change
-    if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
-        raise ValueError(
-            f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
-        )
-    allowed_origins = [frontend_url]
+    if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
+        logger.critical(
+            "FRONTEND_URL is invalid (%s). CORS will be disabled for safety.",
+            frontend_url,
+        )
+        allowed_origins = []
+    else:
+        allowed_origins = [frontend_url]
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 144-146: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 143 - 147, Replace the startup ValueError for
invalid FRONTEND_URL with a critical log and a safe fallback: instead of raising
when frontend_url does not start with "http://" or "https://", call the
application's logger to log a critical message that includes the invalid
frontend_url and then set allowed_origins = [] (maintaining the same fail-closed
behavior as missing values); update the check around frontend_url and
allowed_origins in main.py to perform this logging+fallback rather than raising
so a config typo won't crash the service.

- Added missing columns (`reference_id`, `verified_at`, `assigned_at`, `resolved_at`, `assigned_to`) to `migrate_db` in `init_db.py`.
- Ensured `reference_id` has a unique index during migration.
- Removed unused/problematic dependencies `async_lru` and `indic-nlp-library` from ALL requirements files.
- Replaced `async_lru` with a manual async-safe cache in `gemini_summary.py`.
- Fixed `bcrypt` pinning and split optional extras in `requirements-render.txt`.
- Enhanced migration robustness with explicit `VARCHAR(255)` lengths.
- Maintained O(1) blockchain verification optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants