Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,7 @@
## 2026-02-08 - Return Type Consistency in Utilities
**Learning:** Inconsistent return types in shared utility functions (like `process_uploaded_image`) can cause runtime crashes across multiple modules, especially when some expect tuples and others expect single values. This can lead to deployment failures that are hard to debug without full integration logs.
**Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.

## 2024-05-30 - Chaining for O(1) Integrity Verification
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a formatting issue: "(1)$" should be "O(1)" to properly denote Big O notation. The dollar sign appears to be a typo or formatting error.

Suggested change
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup.

Copilot uses AI. Check for mistakes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the complexity notation typo in the learning note.

single (1)$ database lookup looks accidental; use single O(1) database lookup (or equivalent wording).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.jules/bolt.md at line 42, Replace the accidental complexity token "single
(1)$ database lookup" in the learning note with the correct notation, e.g.
"single O(1) database lookup" (or "constant-time (O(1)) database lookup");
locate the exact phrase "single (1)$ database lookup" in the text and update it
to "single O(1) database lookup" to fix the typo.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray $ or use proper O(1) formatting).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .jules/bolt.md, line 42:

<comment>Fix the malformed inline formatting in this sentence so the learning note renders correctly (remove the stray `$` or use proper `O(1)` formatting).</comment>

<file context>
@@ -37,3 +37,7 @@
 **Action:** Always maintain strict return type consistency for core utilities. Use type hints and verify all call sites when changing a function's signature. Ensure that performance-oriented optimizations (like returning multiple processed formats) are applied uniformly.
+
+## 2024-05-30 - Chaining for O(1) Integrity Verification
+**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
+**Action:** Always store the hash of the preceding record in the current record if integrity chaining is required, allowing for immediate verification without secondary queries.
</file context>
Suggested change
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single (1)$ database lookup.
**Learning:** Chained data structures (like blockchains) that require cross-record lookups for verification can suffer from O(N) or O(log N) latency as the dataset grows. Storing the "back-link" (the previous hash) directly on the current record transforms verification into a single O(1) database lookup.
Fix with Cubic

**Action:** Always store the hash of the preceding record in the current record if integrity chaining is required, allowing for immediate verification without secondary queries.
20 changes: 17 additions & 3 deletions backend/gemini_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import google.generativeai as genai
from typing import Dict, Optional, Callable, Any
import warnings
from async_lru import alru_cache
import time
import logging
import asyncio
from backend.ai_service import retry_with_exponential_backoff
Expand Down Expand Up @@ -45,7 +45,10 @@ def _get_fallback_summary(mla_name: str, assembly_constituency: str, district: s
)


@alru_cache(maxsize=100)
# Simple cache to avoid async-lru dependency
_summary_cache = {}
SUMMARY_CACHE_TTL = 86400 # 24 hours
Comment on lines +48 to +50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Manual cache is unbounded and can grow indefinitely.

The TTL check prevents stale reuse but does not evict old/rarely-hit keys. Over time this can cause memory growth under diverse request inputs.

Suggested change
 _summary_cache = {}
 SUMMARY_CACHE_TTL = 86400  # 24 hours
+SUMMARY_CACHE_MAX_ENTRIES = 5000
@@
-    if cache_key in _summary_cache:
+    # Opportunistic cleanup
+    if len(_summary_cache) > SUMMARY_CACHE_MAX_ENTRIES:
+        expired = [k for k, (_, ts) in _summary_cache.items() if current_time - ts >= SUMMARY_CACHE_TTL]
+        for k in expired:
+            _summary_cache.pop(k, None)
+
+    if cache_key in _summary_cache:
         val, ts = _summary_cache[cache_key]
         if current_time - ts < SUMMARY_CACHE_TTL:
             return val
@@
-        _summary_cache[cache_key] = (summary, current_time)
+        _summary_cache[cache_key] = (summary, current_time)

Also applies to: 74-77, 100-100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` around lines 48 - 50, The manual _summary_cache
dict with SUMMARY_CACHE_TTL is unbounded and can grow forever; replace it with a
bounded cache (e.g., an LRU or TTL-aware bounded structure) and evict old
entries when the size exceeds a MAX_CACHE_SIZE. Concretely: replace
_summary_cache usage with a bounded cache implementation (for example
collections.OrderedDict-based LRU or cachetools.TTLCache) keyed the same way,
set a MAX_CACHE_SIZE constant (e.g., 1000), ensure entries also respect
SUMMARY_CACHE_TTL, and update any places that read/write _summary_cache
(symbols: _summary_cache and SUMMARY_CACHE_TTL and the functions that access
them around the existing uses) to use the new cache API so old/rare keys are
evicted automatically. Ensure thread/async safety if the module is used
concurrently.


async def generate_mla_summary(
district: str,
assembly_constituency: str,
Expand All @@ -54,6 +57,7 @@ async def generate_mla_summary(
) -> str:
"""
Generate a human-readable summary about an MLA using Gemini with retry logic.
Optimized: Uses a simple manual cache to remove async-lru dependency.

Args:
district: District name
Expand All @@ -64,6 +68,14 @@ async def generate_mla_summary(
Returns:
A short paragraph describing the MLA's role and responsibilities
"""
cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a tuple cache key to avoid accidental collisions.

Line [71] concatenates values with _, which can collide when fields themselves contain underscores. A tuple key is safer.

Suggested change
-    cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
+    cache_key = (district, assembly_constituency, mla_name, issue_category)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cache_key = f"{district}_{assembly_constituency}_{mla_name}_{issue_category}"
cache_key = (district, assembly_constituency, mla_name, issue_category)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` at line 71, The current cache_key is built by
concatenating fields into a string (cache_key =
f"{district}_{assembly_constituency}_{mla_name}_{issue_category}") which can
collide if values contain underscores; change cache_key to use an immutable
tuple (e.g., (district, assembly_constituency, mla_name, issue_category))
wherever the key is created and looked up, and update any cache get/set usages
that reference cache_key so they use the tuple form instead of the concatenated
string.

current_time = time.time()

if cache_key in _summary_cache:
val, ts = _summary_cache[cache_key]
if current_time - ts < SUMMARY_CACHE_TTL:
return val

async def _generate_mla_summary_with_gemini() -> str:
"""Inner function to generate MLA summary with Gemini"""
model = genai.GenerativeModel('gemini-1.5-flash')
Expand All @@ -84,7 +96,9 @@ async def _generate_mla_summary_with_gemini() -> str:
return response.text.strip()

try:
return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
_summary_cache[cache_key] = (summary, current_time)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/gemini_summary.py, line 100:

<comment>The new manual cache is unbounded and never evicts entries, so the cache can grow indefinitely with unique MLA/issue combinations and leak memory over time. Add a max size and evict old entries (or reintroduce an LRU) when storing results.</comment>

<file context>
@@ -84,7 +96,9 @@ async def _generate_mla_summary_with_gemini() -> str:
     try:
-        return await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+        summary = await retry_with_exponential_backoff(_generate_mla_summary_with_gemini, max_retries=2)
+        _summary_cache[cache_key] = (summary, current_time)
+        return summary
     except Exception as e:
</file context>
Fix with Cubic

return summary
Comment on lines +99 to +101
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip retry path when Gemini is disabled.

When GEMINI_API_KEY is absent (genai = None), Line [99] still retries and delays response before fallback. Return fallback immediately in that mode.

Suggested change
 async def generate_mla_summary(
@@
-    async def _generate_mla_summary_with_gemini() -> str:
+    if genai is None:
+        return _get_fallback_summary(mla_name, assembly_constituency, district)
+
+    async def _generate_mla_summary_with_gemini() -> str:
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 101-101: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/gemini_summary.py` around lines 99 - 101, When GEMINI is disabled
(genai is None), avoid calling
retry_with_exponential_backoff(_generate_mla_summary_with_gemini, ...) and
returning after delay; instead detect genai is None before the retry and
immediately return the fallback summary (and update _summary_cache[cache_key] if
existing code caches fallbacks) so no retry/delay occurs; specifically modify
the block around retry_with_exponential_backoff to short-circuit when genai is
None, calling the same fallback-path used elsewhere rather than invoking
_generate_mla_summary_with_gemini.

except Exception as e:
logger.error(f"Gemini MLA summary generation failed after retries: {e}")
# Fallback to simple description
Expand Down
41 changes: 34 additions & 7 deletions backend/init_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ def index_exists(table, index_name):
with engine.begin() as conn:
# Issues Table Migrations
if inspector.has_table("issues"):
if not column_exists("issues", "reference_id"):
conn.execute(text("ALTER TABLE issues ADD COLUMN reference_id VARCHAR(255)"))
conn.execute(text("CREATE UNIQUE INDEX IF NOT EXISTS ix_issues_reference_id ON issues (reference_id)"))
logger.info("Added reference_id column to issues")

if not column_exists("issues", "upvotes"):
conn.execute(text("ALTER TABLE issues ADD COLUMN upvotes INTEGER DEFAULT 0"))
logger.info("Added upvotes column to issues")
Expand All @@ -58,19 +63,35 @@ def index_exists(table, index_name):
logger.info("Added longitude column to issues")

if not column_exists("issues", "location"):
conn.execute(text("ALTER TABLE issues ADD COLUMN location VARCHAR"))
conn.execute(text("ALTER TABLE issues ADD COLUMN location VARCHAR(255)"))
logger.info("Added location column to issues")

if not column_exists("issues", "action_plan"):
conn.execute(text("ALTER TABLE issues ADD COLUMN action_plan TEXT"))
logger.info("Added action_plan column to issues")

if not column_exists("issues", "verified_at"):
conn.execute(text("ALTER TABLE issues ADD COLUMN verified_at TIMESTAMP"))
logger.info("Added verified_at column to issues")

if not column_exists("issues", "assigned_at"):
conn.execute(text("ALTER TABLE issues ADD COLUMN assigned_at TIMESTAMP"))
logger.info("Added assigned_at column to issues")

if not column_exists("issues", "resolved_at"):
conn.execute(text("ALTER TABLE issues ADD COLUMN resolved_at TIMESTAMP"))
logger.info("Added resolved_at column to issues")

if not column_exists("issues", "assigned_to"):
conn.execute(text("ALTER TABLE issues ADD COLUMN assigned_to VARCHAR(255)"))
logger.info("Added assigned_to column to issues")

if not column_exists("issues", "integrity_hash"):
conn.execute(text("ALTER TABLE issues ADD COLUMN integrity_hash VARCHAR"))
conn.execute(text("ALTER TABLE issues ADD COLUMN integrity_hash VARCHAR(255)"))
logger.info("Added integrity_hash column to issues")

if not column_exists("issues", "previous_integrity_hash"):
conn.execute(text("ALTER TABLE issues ADD COLUMN previous_integrity_hash VARCHAR"))
conn.execute(text("ALTER TABLE issues ADD COLUMN previous_integrity_hash VARCHAR(255)"))
logger.info("Added previous_integrity_hash column to issues")

# Indexes (using IF NOT EXISTS syntax where supported or check first)
Expand All @@ -95,13 +116,19 @@ def index_exists(table, index_name):
if not index_exists("issues", "ix_issues_user_email"):
conn.execute(text("CREATE INDEX IF NOT EXISTS ix_issues_user_email ON issues (user_email)"))

if not index_exists("issues", "ix_issues_integrity_hash"):
conn.execute(text("CREATE INDEX IF NOT EXISTS ix_issues_integrity_hash ON issues (integrity_hash)"))

if not index_exists("issues", "ix_issues_previous_integrity_hash"):
conn.execute(text("CREATE INDEX IF NOT EXISTS ix_issues_previous_integrity_hash ON issues (previous_integrity_hash)"))

# Voice and Language Support Columns (Issue #291)
if not column_exists("issues", "submission_type"):
conn.execute(text("ALTER TABLE issues ADD COLUMN submission_type VARCHAR DEFAULT 'text'"))
conn.execute(text("ALTER TABLE issues ADD COLUMN submission_type VARCHAR(50) DEFAULT 'text'"))
logger.info("Added submission_type column to issues")

if not column_exists("issues", "original_language"):
conn.execute(text("ALTER TABLE issues ADD COLUMN original_language VARCHAR"))
conn.execute(text("ALTER TABLE issues ADD COLUMN original_language VARCHAR(10)"))
logger.info("Added original_language column to issues")

if not column_exists("issues", "original_text"):
Expand All @@ -117,7 +144,7 @@ def index_exists(table, index_name):
logger.info("Added manual_correction_applied column to issues")

if not column_exists("issues", "audio_file_path"):
conn.execute(text("ALTER TABLE issues ADD COLUMN audio_file_path VARCHAR"))
conn.execute(text("ALTER TABLE issues ADD COLUMN audio_file_path VARCHAR(255)"))
logger.info("Added audio_file_path column to issues")

# Grievances Table Migrations
Expand All @@ -131,7 +158,7 @@ def index_exists(table, index_name):
logger.info("Added longitude column to grievances")

if not column_exists("grievances", "address"):
conn.execute(text("ALTER TABLE grievances ADD COLUMN address VARCHAR"))
conn.execute(text("ALTER TABLE grievances ADD COLUMN address VARCHAR(255)"))
logger.info("Added address column to grievances")

if not column_exists("grievances", "issue_id"):
Expand Down
21 changes: 11 additions & 10 deletions backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,20 +130,21 @@ async def lifespan(app: FastAPI):

if not frontend_url:
if is_production:
raise ValueError(
"FRONTEND_URL environment variable is required for security in production. "
"Set it to your frontend URL (e.g., https://your-app.netlify.app)."
logger.critical(
"FRONTEND_URL environment variable is MISSING in production. "
"CORS will be disabled for safety. Set it to your frontend URL."
)
allowed_origins = []
else:
logger.warning("FRONTEND_URL not set. Defaulting to http://localhost:5173 for development.")
frontend_url = "http://localhost:5173"

if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
raise ValueError(
f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
)

allowed_origins = [frontend_url]
allowed_origins = [frontend_url]
else:
if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
raise ValueError(
f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
)
allowed_origins = [frontend_url]
Comment on lines +143 to +147
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard-crashing the app on malformed FRONTEND_URL.

Line [144] raises at startup, so a config typo can take the service down. Prefer logging critical and falling back to allowed_origins = [] (same fail-closed behavior used for missing values).

Suggested change
-    if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
-        raise ValueError(
-            f"FRONTEND_URL must be a valid HTTP/HTTPS URL. Got: {frontend_url}"
-        )
-    allowed_origins = [frontend_url]
+    if not (frontend_url.startswith("http://") or frontend_url.startswith("https://")):
+        logger.critical(
+            "FRONTEND_URL is invalid (%s). CORS will be disabled for safety.",
+            frontend_url,
+        )
+        allowed_origins = []
+    else:
+        allowed_origins = [frontend_url]
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 144-146: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 143 - 147, Replace the startup ValueError for
invalid FRONTEND_URL with a critical log and a safe fallback: instead of raising
when frontend_url does not start with "http://" or "https://", call the
application's logger to log a critical message that includes the invalid
frontend_url and then set allowed_origins = [] (maintaining the same fail-closed
behavior as missing values); update the check around frontend_url and
allowed_origins in main.py to perform this logging+fallback rather than raising
so a config typo won't crash the service.


if not is_production:
dev_origins = [
Expand Down
3 changes: 2 additions & 1 deletion backend/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,8 @@ class Issue(Base):
longitude = Column(Float, nullable=True, index=True)
location = Column(String, nullable=True)
action_plan = Column(JSONEncodedDict, nullable=True)
integrity_hash = Column(String, nullable=True) # Blockchain integrity seal
integrity_hash = Column(String(255), nullable=True, index=True) # Blockchain integrity seal
previous_integrity_hash = Column(String(255), nullable=True, index=True)

# Voice and Language Support (Issue #291)
submission_type = Column(String, default="text") # 'text', 'voice'
Expand Down
8 changes: 4 additions & 4 deletions backend/requirements-render.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ Pillow
firebase-functions
firebase-admin
a2wsgi
python-jose[cryptography]
passlib[bcrypt]
async_lru
python-jose
cryptography
passlib
bcrypt<4.0.0
SpeechRecognition
pydub
googletrans==4.0.2
langdetect
indic-nlp-library
2 changes: 0 additions & 2 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,3 @@ SpeechRecognition
pydub
googletrans==4.0.2
langdetect
indic-nlp-library
async_lru
25 changes: 15 additions & 10 deletions backend/routers/issues.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,8 @@ async def create_issue(
longitude=longitude,
location=location,
action_plan=initial_action_plan,
integrity_hash=integrity_hash
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
Comment on lines +199 to +200
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blockchain optimization is only being applied to web-based issue creation in this endpoint. However, bot.py (line 74-79) and voice.py (line 260-276) also create Issues but do not compute or store integrity_hash or previous_integrity_hash. This creates an inconsistent blockchain where some issues are chained and others are not. Consider either: 1) Implementing blockchain hashing for all issue creation paths to maintain a complete chain, or 2) Documenting this as an intentional design decision if only web submissions need blockchain verification.

Copilot uses AI. Check for mistakes.
)

# Offload blocking DB operations to threadpool
Expand Down Expand Up @@ -615,24 +616,27 @@ def get_user_issues(
async def verify_blockchain_integrity(issue_id: int, db: Session = Depends(get_db)):
"""
Verify the cryptographic integrity of a report using the blockchain-style chaining.
Optimized: Uses column projection to fetch only needed data.
Optimized: Uses previous_integrity_hash for O(1) verification.
"""
# Fetch current issue data
# Fetch current issue data including previous hash for O(1) verification
current_issue = await run_in_threadpool(
lambda: db.query(
Issue.id, Issue.description, Issue.category, Issue.integrity_hash
Issue.id, Issue.description, Issue.category, Issue.integrity_hash, Issue.previous_integrity_hash
).filter(Issue.id == issue_id).first()
)

if not current_issue:
raise HTTPException(status_code=404, detail="Issue not found")

# Fetch previous issue's integrity hash to verify the chain
prev_issue_hash = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first()
)

prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
# Fallback for legacy records that don't have previous_integrity_hash stored
if current_issue.previous_integrity_hash is None:
# Fetch previous issue's integrity hash to verify the chain
prev_issue_hash = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
else:
prev_hash = current_issue.previous_integrity_hash
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 639:

<comment>The blockchain verification no longer validates that the stored previous_integrity_hash matches the actual previous issue’s integrity_hash, so tampering with the prior issue can go undetected when verifying this issue. Consider validating the backlink against the real previous issue before reporting success.</comment>

<file context>
@@ -615,24 +616,27 @@ def get_user_issues(
+        )
+        prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
+    else:
+        prev_hash = current_issue.previous_integrity_hash
 
     # Recompute hash based on current data and previous hash
</file context>
Fix with Cubic


# Recompute hash based on current data and previous hash
# Chaining logic: hash(description|category|prev_hash)
Expand All @@ -649,6 +653,7 @@ async def verify_blockchain_integrity(issue_id: int, db: Session = Depends(get_d
return BlockchainVerificationResponse(
is_valid=is_valid,
current_hash=current_issue.integrity_hash,
previous_hash=prev_hash,
computed_hash=computed_hash,
message=message
)
Expand Down
1 change: 1 addition & 0 deletions backend/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ class ClosureStatusResponse(BaseModel):
class BlockchainVerificationResponse(BaseModel):
is_valid: bool = Field(..., description="Whether the issue integrity is intact")
current_hash: Optional[str] = Field(None, description="Current integrity hash stored in DB")
previous_hash: Optional[str] = Field(None, description="Previous issue's integrity hash used for chaining")
computed_hash: str = Field(..., description="Hash computed from current issue data and previous issue's hash")
message: str = Field(..., description="Verification result message")

Expand Down
36 changes: 36 additions & 0 deletions tests/test_blockchain.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,42 @@ def test_blockchain_verification_success(client, db_session):
assert data["is_valid"] == True
assert data["current_hash"] == hash2

def test_blockchain_o1_storage(client, db_session):
# Create first issue
hash1_content = "First issue|Road|"
hash1 = hashlib.sha256(hash1_content.encode()).hexdigest()

issue1 = Issue(
description="First issue",
category="Road",
integrity_hash=hash1
)
db_session.add(issue1)
db_session.commit()

# Create second issue via API to test previous_integrity_hash population
response = client.post(
"/api/issues",
data={
"description": "Second issue for O(1) test",
"category": "Road",
"language": "en"
}
)
assert response.status_code == 201
issue2_id = response.json()["id"]

# Verify previous_integrity_hash is stored in DB for issue2
issue2 = db_session.query(Issue).filter(Issue.id == issue2_id).first()
assert issue2.previous_integrity_hash == hash1

# Verify verification response includes previous_hash
response = client.get(f"/api/issues/{issue2_id}/blockchain-verify")
assert response.status_code == 200
data = response.json()
assert data["is_valid"] == True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check if the file exists and read the specific line
if [ -f tests/test_blockchain.py ]; then
  echo "=== Line 98 in tests/test_blockchain.py ==="
  sed -n '98p' tests/test_blockchain.py
  echo ""
  echo "=== Context around line 98 (lines 95-101) ==="
  sed -n '95,101p' tests/test_blockchain.py
  echo ""
  echo "=== Search for all assert...==True patterns in the file ==="
  rg -n 'assert\s+.+==\s*True' tests/test_blockchain.py
else
  echo "File tests/test_blockchain.py not found"
fi

Repository: RohanExploit/VishwaGuru

Length of output: 650


Use direct truth assertion for is_valid.

Avoid explicit comparison to True in assertions; prefer assert data["is_valid"] to satisfy linting and follow Python idioms.

✅ Minimal fix
-    assert data["is_valid"] == True
+    assert data["is_valid"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert data["is_valid"] == True
assert data["is_valid"]
🧰 Tools
🪛 Ruff (0.15.2)

[error] 98-98: Avoid equality comparisons to True; use data["is_valid"]: for truth checks

Replace with data["is_valid"]

(E712)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_blockchain.py` at line 98, Replace the explicit boolean comparison
in the test assertion — change the line asserting data["is_valid"] == True to
use a direct truth assertion (i.e., assert data["is_valid"]) in
tests/test_blockchain.py so the test uses Python idiomatic truthiness for the
data["is_valid"] value.

assert data["previous_hash"] == hash1

def test_blockchain_verification_failure(client, db_session):
# Create issue with tampered hash
issue = Issue(
Expand Down