⚡ Bolt: Optimize blockchain verification to O(1) by storing previous hash#474
⚡ Bolt: Optimize blockchain verification to O(1) by storing previous hash#474RohanExploit wants to merge 4 commits intomainfrom
Conversation
…hash - Added `previous_integrity_hash` to `Issue` model to allow single-query verification. - Added database indexes to `integrity_hash` and `previous_integrity_hash` columns. - Refactored `verify_blockchain_integrity` to use the stored previous hash, reducing DB roundtrips. - Updated `create_issue` to populate the new field during report creation. - Enhanced `BlockchainVerificationResponse` schema for transparency. - Updated blockchain tests to verify the O(1) storage and verification logic. - Implemented database migration in `init_db.py` for indexes.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
👷 Deploy Preview for fixmybharat processing.
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
📝 WalkthroughWalkthroughAdds O(1) blockchain-style chaining by storing Changes
Sequence DiagramsequenceDiagram
actor Client
participant Router as Issue Router
participant DB as Database
Note over Client,DB: Create Issue (chaining)
Client->>Router: POST /issues (payload)
Router->>DB: SELECT latest Issue.integrity_hash
DB-->>Router: latest_integrity_hash
Router->>DB: INSERT Issue (integrity_hash, previous_integrity_hash=latest_integrity_hash)
DB-->>Router: Insert OK
Router-->>Client: 201 Created
Note over Client,DB: O(1) Verification
Client->>Router: GET /blockchain/verify/{issue_id}
Router->>DB: SELECT Issue (integrity_hash, previous_integrity_hash) WHERE id=issue_id
DB-->>Router: current_issue (with previous_integrity_hash)
Router->>Router: Recompute hash using previous_integrity_hash
Router->>Router: Compare recomputed vs stored integrity_hash
Router-->>Client: BlockchainVerificationResponse { previous_hash, computed_hash, current_hash, is_valid }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the blockchain integrity verification system by storing the previous issue's integrity hash directly in each issue record, achieving O(1) verification performance instead of O(N). The optimization maintains full backward compatibility through a fallback mechanism for legacy records that don't have the cached previous hash.
Changes:
- Added
previous_integrity_hashcolumn to the Issue model with index for efficient lookups - Updated issue creation to store the previous issue's hash at the time of creation
- Modified verification endpoint to use the stored hash with fallback to legacy query for old records
- Added new API response field to expose the previous hash in verification results
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/models.py | Added previous_integrity_hash column with index to Issue model |
| backend/init_db.py | Added database migration for new column and indices on both hash columns |
| backend/routers/issues.py | Updated issue creation to store previous hash; optimized verification to O(1) with legacy fallback |
| backend/schemas.py | Added previous_hash field to BlockchainVerificationResponse |
| tests/test_blockchain.py | Added new test to verify O(1) storage and previous_hash in response |
| .jules/bolt.md | Documented the optimization learning with minor formatting issues |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly. | ||
|
|
||
| ## 2024-05-30 - Chaining for O(1) Integrity Verification | ||
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. |
There was a problem hiding this comment.
There's a formatting issue: "(1)$" should be "O(1)" to properly denote Big O notation. The dollar sign appears to be a typo or formatting error.
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. | |
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup. |
| integrity_hash=integrity_hash, | ||
| previous_integrity_hash=prev_hash |
There was a problem hiding this comment.
The blockchain optimization is only being applied to web-based issue creation in this endpoint. However, bot.py (line 74-79) and voice.py (line 260-276) also create Issues but do not compute or store integrity_hash or previous_integrity_hash. This creates an inconsistent blockchain where some issues are chained and others are not. Consider either: 1) Implementing blockchain hashing for all issue creation paths to maintain a complete chain, or 2) Documenting this as an intentional design decision if only web submissions need blockchain verification.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/routers/issues.py (1)
173-177:⚠️ Potential issue | 🟠 MajorChain head assignment is non-atomic, allowing concurrent forked links.
prev_hashis read and then reused for insert without serialization. Concurrent requests can bind to the same parent hash, creating multiple children that all verify as valid, which breaks strict linear-chain semantics.Please serialize head selection + insert in one critical section (e.g., DB advisory lock or equivalent transactional chain-head lock).
Also applies to: 199-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/routers/issues.py` around lines 173 - 177, The chain-head read (prev_issue/prev_hash computed via run_in_threadpool/db.query(Issue.integrity_hash)) is non-atomic with the subsequent insert and must be serialized; wrap the head-selection and the insert that uses prev_hash in a single DB-critical section (e.g., acquire a DB advisory/transactional lock or use SELECT ... FOR UPDATE within a transaction) so no two requests can read the same head and create concurrent children. Modify the code around the prev_issue/prev_hash lookup and the later insert routine to begin a transaction or obtain an advisory lock (using your DB's advisory lock API), re-read the head inside that lock/transaction, compute prev_hash, perform the insert, then release/commit; apply the same fix to the other occurrence that reads prev_hash (the block referenced near lines 199-200).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.jules/bolt.md:
- Line 42: Replace the accidental complexity token "single (1)$ database lookup"
in the learning note with the correct notation, e.g. "single O(1) database
lookup" (or "constant-time (O(1)) database lookup"); locate the exact phrase
"single (1)$ database lookup" in the text and update it to "single O(1) database
lookup" to fix the typo.
In `@tests/test_blockchain.py`:
- Line 98: Replace the explicit boolean comparison in the test assertion —
change the line asserting data["is_valid"] == True to use a direct truth
assertion (i.e., assert data["is_valid"]) in tests/test_blockchain.py so the
test uses Python idiomatic truthiness for the data["is_valid"] value.
---
Outside diff comments:
In `@backend/routers/issues.py`:
- Around line 173-177: The chain-head read (prev_issue/prev_hash computed via
run_in_threadpool/db.query(Issue.integrity_hash)) is non-atomic with the
subsequent insert and must be serialized; wrap the head-selection and the insert
that uses prev_hash in a single DB-critical section (e.g., acquire a DB
advisory/transactional lock or use SELECT ... FOR UPDATE within a transaction)
so no two requests can read the same head and create concurrent children. Modify
the code around the prev_issue/prev_hash lookup and the later insert routine to
begin a transaction or obtain an advisory lock (using your DB's advisory lock
API), re-read the head inside that lock/transaction, compute prev_hash, perform
the insert, then release/commit; apply the same fix to the other occurrence that
reads prev_hash (the block referenced near lines 199-200).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.jules/bolt.mdbackend/init_db.pybackend/models.pybackend/routers/issues.pybackend/schemas.pytests/test_blockchain.py
| **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly. | ||
|
|
||
| ## 2024-05-30 - Chaining for O(1) Integrity Verification | ||
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. |
There was a problem hiding this comment.
Fix the complexity notation typo in the learning note.
single (1)$ database lookup looks accidental; use single O(1) database lookup (or equivalent wording).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.jules/bolt.md at line 42, Replace the accidental complexity token "single
(1)$ database lookup" in the learning note with the correct notation, e.g.
"single O(1) database lookup" (or "constant-time (O(1)) database lookup");
locate the exact phrase "single (1)$ database lookup" in the text and update it
to "single O(1) database lookup" to fix the typo.
| response = client.get(f"/api/issues/{issue2_id}/blockchain-verify") | ||
| assert response.status_code == 200 | ||
| data = response.json() | ||
| assert data["is_valid"] == True |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, check if the file exists and read the specific line
if [ -f tests/test_blockchain.py ]; then
echo "=== Line 98 in tests/test_blockchain.py ==="
sed -n '98p' tests/test_blockchain.py
echo ""
echo "=== Context around line 98 (lines 95-101) ==="
sed -n '95,101p' tests/test_blockchain.py
echo ""
echo "=== Search for all assert...==True patterns in the file ==="
rg -n 'assert\s+.+==\s*True' tests/test_blockchain.py
else
echo "File tests/test_blockchain.py not found"
fiRepository: RohanExploit/VishwaGuru
Length of output: 650
Use direct truth assertion for is_valid.
Avoid explicit comparison to True in assertions; prefer assert data["is_valid"] to satisfy linting and follow Python idioms.
✅ Minimal fix
- assert data["is_valid"] == True
+ assert data["is_valid"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert data["is_valid"] == True | |
| assert data["is_valid"] |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 98-98: Avoid equality comparisons to True; use data["is_valid"]: for truth checks
Replace with data["is_valid"]
(E712)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_blockchain.py` at line 98, Replace the explicit boolean comparison
in the test assertion — change the line asserting data["is_valid"] == True to
use a direct truth assertion (i.e., assert data["is_valid"]) in
tests/test_blockchain.py so the test uses Python idiomatic truthiness for the
data["is_valid"] value.
There was a problem hiding this comment.
2 issues found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/routers/issues.py">
<violation number="1" location="backend/routers/issues.py:639">
P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.</violation>
</file>
<file name=".jules/bolt.md">
<violation number="1" location=".jules/bolt.md:42">
P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray `$` or use proper `O(1)` formatting).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ) | ||
| prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else "" | ||
| else: | ||
| prev_hash = current_issue.previous_integrity_hash |
There was a problem hiding this comment.
P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 639:
<comment>The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.</comment>
<file context>
@@ -615,24 +616,27 @@ def get_user_issues(
+ )
+ prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
+ else:
+ prev_hash = current_issue.previous_integrity_hash
# Recompute hash based on current data and previous hash
</file context>
| **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly. | ||
|
|
||
| ## 2024-05-30 - Chaining for O(1) Integrity Verification | ||
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. |
There was a problem hiding this comment.
P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray $ or use proper O(1) formatting).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .jules/bolt.md, line 42:
<comment>Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray `$` or use proper `O(1)` formatting).</comment>
<file context>
@@ -37,3 +37,7 @@
**Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.
+
+## 2024-05-30 - Chaining for O(1) Integrity Verification
+**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
+**Action:** Always store the hash of the preceding record in the current record if integrity chaining is required, allowing for immediate verification without secondary queries.
</file context>
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. | |
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup. |
- Pinned `bcrypt<4.0.0` in `backend/requirements-render.txt` to avoid build failures on Render. - Split `python-jose` and `cryptography` for cleaner dependency management. - Refined `backend/init_db.py` to use `VARCHAR(255)` for better compatibility. - Updated `Issue` model to use `String(255)` for hash columns. - Verified app startup and migration logic on fresh and existing databases.
- Optimized blockchain integrity verification from O(log N) to O(1) by storing `previous_integrity_hash` on each report. - Added database indexes for `integrity_hash` and `previous_integrity_hash`. - Fixed Render deployment by pinning `bcrypt<4.0.0` and removing `async_lru`. - Hardened `init_db.py` migrations with `VARCHAR(255)` and column existence checks. - Enhanced CORS safety in `main.py` for production environments. - Added verification tests in `tests/test_blockchain.py`.
🔍 Quality Reminder |
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/gemini_summary.py">
<violation number="1" location="backend/gemini_summary.py:100">
P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| try: | ||
| return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2) | ||
| summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2) | ||
| _summary_cache[cache_key] = (summary, current_time) |
There was a problem hiding this comment.
P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/gemini_summary.py, line 100:
<comment>The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.</comment>
<file context>
@@ -84,7 +96,9 @@ async def _generate_mla_summary_with_gemini() -> str:
try:
- return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+ summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+ _summary_cache[cache_key] = (summary, current_time)
+ return summary
except Exception as e:
</file context>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/gemini_summary.py`:
- Around line 48-50: The manual _summary_cache dict with SUMMARY_CACHE_TTL is
unbounded and can grow forever; replace it with a bounded cache (e.g., an LRU or
TTL-aware bounded structure) and evict old entries when the size exceeds a
MAX_CACHE_SIZE. Concretely: replace _summary_cache usage with a bounded cache
implementation (for example collections.OrderedDict-based LRU or
cachetools.TTLCache) keyed the same way, set a MAX_CACHE_SIZE constant (e.g.,
1000), ensure entries also respect SUMMARY_CACHE_TTL, and update any places that
read/write _summary_cache (symbols: _summary_cache and SUMMARY_CACHE_TTL and the
functions that access them around the existing uses) to use the new cache API so
old/rare keys are evicted automatically. Ensure thread/async safety if the
module is used concurrently.
- Line 71: The current cache_key is built by concatenating fields into a string
(cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}")
which can collide if values contain underscores; change cache_key to use an
immutable tuple (e.g., (district, assembly_constituency, mla_name,
issue_category)) wherever the key is created and looked up, and update any cache
get/set usages that reference cache_key so they use the tuple form instead of
the concatenated string.
- Around line 99-101: When GEMINI is disabled (genai is None), avoid calling
retry_with_exponential_backoff(_generate_mla_summary_with_gemini, ...) and
returning after delay; instead detect genai is None before the retry and
immediately return the fallback summary (and update _summary_cache[cache_key] if
existing code caches fallbacks) so no retry/delay occurs; specifically modify
the block around retry_with_exponential_backoff to short-circuit when genai is
None, calling the same fallback-path used elsewhere rather than invoking
_generate_mla_summary_with_gemini.
In `@backend/main.py`:
- Around line 143-147: Replace the startup ValueError for invalid FRONTEND_URL
with a critical log and a safe fallback: instead of raising when frontend_url
does not start with "http://" or "https://", call the application's logger to
log a critical message that includes the invalid frontend_url and then set
allowed_origins = [] (maintaining the same fail-closed behavior as missing
values); update the check around frontend_url and allowed_origins in main.py to
perform this logging+fallback rather than raising so a config typo won't crash
the service.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/gemini_summary.pybackend/init_db.pybackend/main.pybackend/models.pybackend/requirements-render.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/init_db.py
- backend/models.py
| # Simple cache to avoid async-lru dependency | ||
| _summary_cache = {} | ||
| SUMMARY_CACHE_TTL = 86400 # 24 hours |
There was a problem hiding this comment.
Manual cache is unbounded and can grow indefinitely.
The TTL check prevents stale reuse but does not evict old/rarely-hit keys. Over time this can cause memory growth under diverse request inputs.
Suggested change
_summary_cache = {}
SUMMARY_CACHE_TTL = 86400 # 24 hours
+SUMMARY_CACHE_MAX_ENTRIES = 5000
@@
- if cache_key in _summary_cache:
+ # Opportunistic cleanup
+ if len(_summary_cache) > SUMMARY_CACHE_MAX_ENTRIES:
+ expired = [k for k, (_, ts) in _summary_cache.items() if current_time - ts >= SUMMARY_CACHE_TTL]
+ for k in expired:
+ _summary_cache.pop(k, None)
+
+ if cache_key in _summary_cache:
val, ts = _summary_cache[cache_key]
if current_time - ts < SUMMARY_CACHE_TTL:
return val
@@
- _summary_cache[cache_key] = (summary, current_time)
+ _summary_cache[cache_key] = (summary, current_time)Also applies to: 74-77, 100-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/gemini_summary.py` around lines 48 - 50, The manual _summary_cache
dict with SUMMARY_CACHE_TTL is unbounded and can grow forever; replace it with a
bounded cache (e.g., an LRU or TTL-aware bounded structure) and evict old
entries when the size exceeds a MAX_CACHE_SIZE. Concretely: replace
_summary_cache usage with a bounded cache implementation (for example
collections.OrderedDict-based LRU or cachetools.TTLCache) keyed the same way,
set a MAX_CACHE_SIZE constant (e.g., 1000), ensure entries also respect
SUMMARY_CACHE_TTL, and update any places that read/write _summary_cache
(symbols: _summary_cache and SUMMARY_CACHE_TTL and the functions that access
them around the existing uses) to use the new cache API so old/rare keys are
evicted automatically. Ensure thread/async safety if the module is used
concurrently.
| Returns: | ||
| A short paragraph describing the MLA's role and responsibilities | ||
| """ | ||
| cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}" |
There was a problem hiding this comment.
Use a tuple cache key to avoid accidental collisions.
Line [71] concatenates values with _, which can collide when fields themselves contain underscores. A tuple key is safer.
Suggested change
- cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
+ cache_key = (district, assembly_constituency, mla_name, issue_category)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}" | |
| cache_key = (district, assembly_constituency, mla_name, issue_category) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/gemini_summary.py` at line 71, The current cache_key is built by
concatenating fields into a string (cache_key =
f"{district}_{assembly_constituency}_{mla_name}_{issue_category}") which can
collide if values contain underscores; change cache_key to use an immutable
tuple (e.g., (district, assembly_constituency, mla_name, issue_category))
wherever the key is created and looked up, and update any cache get/set usages
that reference cache_key so they use the tuple form instead of the concatenated
string.
| summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2) | ||
| _summary_cache[cache_key] = (summary, current_time) | ||
| return summary |
There was a problem hiding this comment.
Skip retry path when Gemini is disabled.
When GEMINI_API_KEY is absent (genai = None), Line [99] still retries and delays response before fallback. Return fallback immediately in that mode.
Suggested change
async def generate_mla_summary(
@@
- async def _generate_mla_summary_with_gemini() -> str:
+ if genai is None:
+ return _get_fallback_summary(mla_name, assembly_constituency, district)
+
+ async def _generate_mla_summary_with_gemini() -> str:🧰 Tools
🪛 Ruff (0.15.2)
[warning] 101-101: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/gemini_summary.py` around lines 99 - 101, When GEMINI is disabled
(genai is None), avoid calling
retry_with_exponential_backoff(_generate_mla_summary_with_gemini, ...) and
returning after delay; instead detect genai is None before the retry and
immediately return the fallback summary (and update _summary_cache[cache_key] if
existing code caches fallbacks) so no retry/delay occurs; specifically modify
the block around retry_with_exponential_backoff to short-circuit when genai is
None, calling the same fallback-path used elsewhere rather than invoking
_generate_mla_summary_with_gemini.
| if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")): | ||
| raise ValueError( | ||
| f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}" | ||
| ) | ||
| allowed_origins = [frontend_url] |
There was a problem hiding this comment.
Avoid hard-crashing the app on malformed FRONTEND_URL.
Line [144] raises at startup, so a config typo can take the service down. Prefer logging critical and falling back to allowed_origins = [] (same fail-closed behavior used for missing values).
Suggested change
- if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
- raise ValueError(
- f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
- )
- allowed_origins = [frontend_url]
+ if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
+ logger.critical(
+ "FRONTEND_URL is invalid (%s). CORS will be disabled for safety.",
+ frontend_url,
+ )
+ allowed_origins = []
+ else:
+ allowed_origins = [frontend_url]🧰 Tools
🪛 Ruff (0.15.2)
[warning] 144-146: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/main.py` around lines 143 - 147, Replace the startup ValueError for
invalid FRONTEND_URL with a critical log and a safe fallback: instead of raising
when frontend_url does not start with "http://" or "https://", call the
application's logger to log a critical message that includes the invalid
frontend_url and then set allowed_origins = [] (maintaining the same fail-closed
behavior as missing values); update the check around frontend_url and
allowed_origins in main.py to perform this logging+fallback rather than raising
so a config typo won't crash the service.
- Added missing columns (`reference_id`, `verified_at`, `assigned_at`, `resolved_at`, `assigned_to`) to `migrate_db` in `init_db.py`. - Ensured `reference_id` has a unique index during migration. - Removed unused/problematic dependencies `async_lru` and `indic-nlp-library` from ALL requirements files. - Replaced `async_lru` with a manual async-safe cache in `gemini_summary.py`. - Fixed `bcrypt` pinning and split optional extras in `requirements-render.txt`. - Enhanced migration robustness with explicit `VARCHAR(255)` lengths. - Maintained O(1) blockchain verification optimization.
Identified and optimized a performance bottleneck in the blockchain integrity verification process. By storing the hash of the preceding record directly in the current issue (as a back-link), the system now performs verification with a single database lookup instead of two, achieving$O(1)$ database roundtrip efficiency for the link verification. Indices were also added to ensure fast lookups on audit trails. Full backward compatibility is maintained via a fallback mechanism for legacy records.
PR created automatically by Jules for task 15333545155765582858 started by @RohanExploit
Summary by cubic
Stores each issue’s previous hash to make blockchain verification O(1) with a single DB lookup. Updates the verification API, expands migrations with indexes and new tracking fields, keeps a fallback for legacy records, and stabilizes Render builds and CORS.
New Features
Refactors
Written for commit a079f17. Summary will update on new commits.
Summary by CodeRabbit