-
Notifications
You must be signed in to change notification settings - Fork 37
⚡ Bolt: Optimize blockchain verification to O(1) by storing previous hash #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
edce47c
4f4f065
a10dad5
a079f17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -37,3 +37,7 @@ | |||||
| ## 2026-02-08 - Return Type Consistency in Utilities | ||||||
| **Learning:** Inconsistent return types in shared utility functions (like `process_uploaded_image`) can cause runtime crashes across multiple modules, especially when some expect tuples and others expect single values. This can lead to deployment failures that are hard to debug without full integration logs. | ||||||
| **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly. | ||||||
|
|
||||||
| ## 2024-05-30 - Chaining for O(1) Integrity Verification | ||||||
| **Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the complexity notation typo in the learning note.
🤖 Prompt for AI Agents
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray Prompt for AI agents
Suggested change
|
||||||
| **Action:** Always store the hash of the preceding record in the current record if integrity chaining is required, allowing for immediate verification without secondary queries. | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -7,7 +7,7 @@ | |||||
| import google.generativeai as genai | ||||||
| from typing import Dict, Optional, Callable, Any | ||||||
| import warnings | ||||||
| from async_lru import alru_cache | ||||||
| import time | ||||||
| import logging | ||||||
| import asyncio | ||||||
| from backend.ai_service import retry_with_exponential_backoff | ||||||
|
|
@@ -45,7 +45,10 @@ def _get_fallback_summary(mla_name: str, assembly_constituency: str, district: s | |||||
| ) | ||||||
|
|
||||||
|
|
||||||
| @alru_cache(maxsize=100) | ||||||
| # Simple cache to avoid async-lru dependency | ||||||
| _summary_cache = {} | ||||||
| SUMMARY_CACHE_TTL = 86400 # 24 hours | ||||||
|
Comment on lines
+48
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Manual cache is unbounded and can grow indefinitely. The TTL check prevents stale reuse but does not evict old/rarely-hit keys. Over time this can cause memory growth under diverse request inputs. Suggested change _summary_cache = {}
SUMMARY_CACHE_TTL = 86400 # 24 hours
+SUMMARY_CACHE_MAX_ENTRIES = 5000
@@
- if cache_key in _summary_cache:
+ # Opportunistic cleanup
+ if len(_summary_cache) > SUMMARY_CACHE_MAX_ENTRIES:
+ expired = [k for k, (_, ts) in _summary_cache.items() if current_time - ts >= SUMMARY_CACHE_TTL]
+ for k in expired:
+ _summary_cache.pop(k, None)
+
+ if cache_key in _summary_cache:
val, ts = _summary_cache[cache_key]
if current_time - ts < SUMMARY_CACHE_TTL:
return val
@@
- _summary_cache[cache_key] = (summary, current_time)
+ _summary_cache[cache_key] = (summary, current_time)Also applies to: 74-77, 100-100 🤖 Prompt for AI Agents |
||||||
|
|
||||||
| async def generate_mla_summary( | ||||||
| district: str, | ||||||
| assembly_constituency: str, | ||||||
|
|
@@ -54,6 +57,7 @@ async def generate_mla_summary( | |||||
| ) -> str: | ||||||
| """ | ||||||
| Generate a human-readable summary about an MLA using Gemini with retry logic. | ||||||
| Optimized: Uses a simple manual cache to remove async-lru dependency. | ||||||
|
|
||||||
| Args: | ||||||
| district: District name | ||||||
|
|
@@ -64,6 +68,14 @@ async def generate_mla_summary( | |||||
| Returns: | ||||||
| A short paragraph describing the MLA's role and responsibilities | ||||||
| """ | ||||||
| cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}" | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a tuple cache key to avoid accidental collisions. Line [71] concatenates values with Suggested change- cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
+ cache_key = (district, assembly_constituency, mla_name, issue_category)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| current_time = time.time() | ||||||
|
|
||||||
| if cache_key in _summary_cache: | ||||||
| val, ts = _summary_cache[cache_key] | ||||||
| if current_time - ts < SUMMARY_CACHE_TTL: | ||||||
| return val | ||||||
|
|
||||||
| async def _generate_mla_summary_with_gemini() -> str: | ||||||
| """Inner function to generate MLA summary with Gemini""" | ||||||
| model = genai.GenerativeModel('gemini-1.5-flash') | ||||||
|
|
@@ -84,7 +96,9 @@ async def _generate_mla_summary_with_gemini() -> str: | |||||
| return response.text.strip() | ||||||
|
|
||||||
| try: | ||||||
| return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2) | ||||||
| summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2) | ||||||
| _summary_cache[cache_key] = (summary, current_time) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results. Prompt for AI agents |
||||||
| return summary | ||||||
|
Comment on lines
+99
to
+101
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Skip retry path when Gemini is disabled. When Suggested change async def generate_mla_summary(
@@
- async def _generate_mla_summary_with_gemini() -> str:
+ if genai is None:
+ return _get_fallback_summary(mla_name, assembly_constituency, district)
+
+ async def _generate_mla_summary_with_gemini() -> str:🧰 Tools🪛 Ruff (0.15.2)[warning] 101-101: Consider moving this statement to an (TRY300) 🤖 Prompt for AI Agents |
||||||
| except Exception as e: | ||||||
| logger.error(f"Gemini MLA summary generation failed after retries: {e}") | ||||||
| # Fallback to simple description | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -130,20 +130,21 @@ async def lifespan(app: FastAPI): | |
|
|
||
| if not frontend_url: | ||
| if is_production: | ||
| raise ValueError( | ||
| "FRONTEND_URL environment variable is required for security in production. " | ||
| "Set it to your frontend URL (e.g., https://your-app.netlify.app)." | ||
| logger.critical( | ||
| "FRONTEND_URL environment variable is MISSING in production. " | ||
| "CORS will be disabled for safety. Set it to your frontend URL." | ||
| ) | ||
| allowed_origins = [] | ||
| else: | ||
| logger.warning("FRONTEND_URL not set. Defaulting to http://localhost:5173 for development.") | ||
| frontend_url = "http://localhost:5173" | ||
|
|
||
| if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")): | ||
| raise ValueError( | ||
| f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}" | ||
| ) | ||
|
|
||
| allowed_origins = [frontend_url] | ||
| allowed_origins = [frontend_url] | ||
| else: | ||
| if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")): | ||
| raise ValueError( | ||
| f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}" | ||
| ) | ||
| allowed_origins = [frontend_url] | ||
|
Comment on lines
+143
to
+147
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid hard-crashing the app on malformed Line [144] raises at startup, so a config typo can take the service down. Prefer logging critical and falling back to Suggested change- if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
- raise ValueError(
- f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
- )
- allowed_origins = [frontend_url]
+ if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
+ logger.critical(
+ "FRONTEND_URL is invalid (%s). CORS will be disabled for safety.",
+ frontend_url,
+ )
+ allowed_origins = []
+ else:
+ allowed_origins = [frontend_url]🧰 Tools🪛 Ruff (0.15.2)[warning] 144-146: Avoid specifying long messages outside the exception class (TRY003) 🤖 Prompt for AI Agents |
||
|
|
||
| if not is_production: | ||
| dev_origins = [ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,5 +30,3 @@ SpeechRecognition | |
| pydub | ||
| googletrans==4.0.2 | ||
| langdetect | ||
| indic-nlp-library | ||
| async_lru | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -196,7 +196,8 @@ async def create_issue( | |
| longitude=longitude, | ||
| location=location, | ||
| action_plan=initial_action_plan, | ||
| integrity_hash=integrity_hash | ||
| integrity_hash=integrity_hash, | ||
| previous_integrity_hash=prev_hash | ||
|
Comment on lines
+199
to
+200
|
||
| ) | ||
|
|
||
| # Offload blocking DB operations to threadpool | ||
|
|
@@ -615,24 +616,27 @@ def get_user_issues( | |
| async def verify_blockchain_integrity(issue_id: int, db: Session = Depends(get_db)): | ||
| """ | ||
| Verify the cryptographic integrity of a report using the blockchain-style chaining. | ||
| Optimized: Uses column projection to fetch only needed data. | ||
| Optimized: Uses previous_integrity_hash for O(1) verification. | ||
| """ | ||
| # Fetch current issue data | ||
| # Fetch current issue data including previous hash for O(1) verification | ||
| current_issue = await run_in_threadpool( | ||
| lambda: db.query( | ||
| Issue.id, Issue.description, Issue.category, Issue.integrity_hash | ||
| Issue.id, Issue.description, Issue.category, Issue.integrity_hash, Issue.previous_integrity_hash | ||
| ).filter(Issue.id == issue_id).first() | ||
| ) | ||
|
|
||
| if not current_issue: | ||
| raise HTTPException(status_code=404, detail="Issue not found") | ||
|
|
||
| # Fetch previous issue's integrity hash to verify the chain | ||
| prev_issue_hash = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first() | ||
| ) | ||
|
|
||
| prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else "" | ||
| # Fallback for legacy records that don't have previous_integrity_hash stored | ||
| if current_issue.previous_integrity_hash is None: | ||
| # Fetch previous issue's integrity hash to verify the chain | ||
| prev_issue_hash = await run_in_threadpool( | ||
| lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first() | ||
| ) | ||
| prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else "" | ||
| else: | ||
| prev_hash = current_issue.previous_integrity_hash | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success. Prompt for AI agents |
||
|
|
||
| # Recompute hash based on current data and previous hash | ||
| # Chaining logic: hash(description|category|prev_hash) | ||
|
|
@@ -649,6 +653,7 @@ async def verify_blockchain_integrity(issue_id: int, db: Session = Depends(get_d | |
| return BlockchainVerificationResponse( | ||
| is_valid=is_valid, | ||
| current_hash=current_issue.integrity_hash, | ||
| previous_hash=prev_hash, | ||
| computed_hash=computed_hash, | ||
| message=message | ||
| ) | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -62,6 +62,42 @@ def test_blockchain_verification_success(client, db_session): | |||||
| assert data["is_valid"] == True | ||||||
| assert data["current_hash"] == hash2 | ||||||
|
|
||||||
| def test_blockchain_o1_storage(client, db_session): | ||||||
| # Create first issue | ||||||
| hash1_content = "First issue|Road|" | ||||||
| hash1 = hashlib.sha256(hash1_content.encode()).hexdigest() | ||||||
|
|
||||||
| issue1 = Issue( | ||||||
| description="First issue", | ||||||
| category="Road", | ||||||
| integrity_hash=hash1 | ||||||
| ) | ||||||
| db_session.add(issue1) | ||||||
| db_session.commit() | ||||||
|
|
||||||
| # Create second issue via API to test previous_integrity_hash population | ||||||
| response = client.post( | ||||||
| "/api/issues", | ||||||
| data={ | ||||||
| "description": "Second issue for O(1) test", | ||||||
| "category": "Road", | ||||||
| "language": "en" | ||||||
| } | ||||||
| ) | ||||||
| assert response.status_code == 201 | ||||||
| issue2_id = response.json()["id"] | ||||||
|
|
||||||
| # Verify previous_integrity_hash is stored in DB for issue2 | ||||||
| issue2 = db_session.query(Issue).filter(Issue.id == issue2_id).first() | ||||||
| assert issue2.previous_integrity_hash == hash1 | ||||||
|
|
||||||
| # Verify verification response includes previous_hash | ||||||
| response = client.get(f"/api/issues/{issue2_id}/blockchain-verify") | ||||||
| assert response.status_code == 200 | ||||||
| data = response.json() | ||||||
| assert data["is_valid"] == True | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# First, check if the file exists and read the specific line
if [ -f tests/test_blockchain.py ]; then
echo "=== Line 98 in tests/test_blockchain.py ==="
sed -n '98p' tests/test_blockchain.py
echo ""
echo "=== Context around line 98 (lines 95-101) ==="
sed -n '95,101p' tests/test_blockchain.py
echo ""
echo "=== Search for all assert...==True patterns in the file ==="
rg -n 'assert\s+.+==\s*True' tests/test_blockchain.py
else
echo "File tests/test_blockchain.py not found"
fiRepository: RohanExploit/VishwaGuru Length of output: 650 Use direct truth assertion for Avoid explicit comparison to ✅ Minimal fix- assert data["is_valid"] == True
+ assert data["is_valid"]📝 Committable suggestion
Suggested change
🧰 Tools🪛 Ruff (0.15.2)[error] 98-98: Avoid equality comparisons to Replace with (E712) 🤖 Prompt for AI Agents |
||||||
| assert data["previous_hash"] == hash1 | ||||||
|
|
||||||
| def test_blockchain_verification_failure(client, db_session): | ||||||
| # Create issue with tampered hash | ||||||
| issue = Issue( | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a formatting issue: "(1)$" should be "O(1)" to properly denote Big O notation. The dollar sign appears to be a typo or formatting error.