Skip to content

fix: optimize blockchain verification and improve startup robustness#453

Open
RohanExploit wants to merge 4 commits intomainfrom
fix-blockchain-integrity-robustness-7179034731236619929
Open

fix: optimize blockchain verification and improve startup robustness#453
RohanExploit wants to merge 4 commits intomainfrom
fix-blockchain-integrity-robustness-7179034731236619929

Conversation

@RohanExploit
Copy link
Owner

@RohanExploit RohanExploit commented Feb 23, 2026

This PR addresses critical performance and logical issues in the blockchain verification feature and improves application robustness.

Key Changes:

  1. Blockchain Integrity Optimization:

    • Added a previous_integrity_hash column to the Issue table (schema migration handled in init_db.py).
    • Refactored create_issue to fetch the last hash and store it explicitly in the new record.
    • Refactored verify_blockchain_integrity to use the stored previous_integrity_hash for O(1) verification, replacing the expensive and race-condition-prone O(N) lookup.
    • This ensures the "chain" is immutable and efficient, scaling with data growth.
  2. Startup Robustness:

    • Modified backend/main.py to prevent the application from crashing if FRONTEND_URL is missing in production.
    • Instead of raising a ValueError, it now logs a critical error and sets allowed_origins to [] (effectively blocking all CORS requests). This adheres to "Fail Secure" while satisfying the requirement to "not break the flow of work" (keeping the backend service available for health checks and internal APIs).

Testing:

  • Verified logic via static analysis and existing test patterns.
  • Attempted integration tests but encountered environment-specific mocking issues; logic is verified to align with existing patterns in backend/routers/issues.py.
  • Manual verification of init_db.py confirms migration logic exists.

PR created automatically by Jules for task 7179034731236619929 started by @RohanExploit


Summary by cubic

Optimized blockchain verification to O(1) with a stored previous hash and hardened startup by denying CORS in production when FRONTEND_URL is missing or invalid. Fixed Render deploy crashes by adding required voice deps and removing an unused heavy package to shrink slug size.

  • Refactors

    • Added Issue.previous_integrity_hash and store it in create_issue.
    • verify_blockchain_integrity uses previous_integrity_hash with a legacy fallback.
    • DB migration handled in init_db.py; cuts verification from O(N) to O(1) and removes race conditions.
  • Bug Fixes

    • In production, missing or invalid FRONTEND_URL logs an error, is ignored for CORS, and no wildcard is allowed; dev origins added safely without duplicates.
    • Added SpeechRecognition, pydub, googletrans, langdetect to requirements-render.txt; removed indic-nlp-library to avoid heavy transitive deps and deployment failures.

Written for commit 6484ca7. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Blockchain-style integrity chaining for records with optimized O(1) verification
    • Added voice and language rendering support for backend processing
  • Bug Fixes

    • Improved CORS handling with safer production/default development behavior when frontend URL is missing or invalid
    • Strengthened integrity verification with a reliable fallback path to preserve consistency

- Added `previous_integrity_hash` to `Issue` model for O(1) blockchain verification.
- Updated `backend/routers/issues.py` to store and verify using `previous_integrity_hash`, preventing race conditions and optimizing performance.
- Updated `backend/main.py` to safely handle missing `FRONTEND_URL` in production by logging a critical error and disabling CORS instead of crashing, ensuring service availability.
- Confirmed `backend/init_db.py` handles database migration for the new column.
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings February 23, 2026 11:23
@netlify
Copy link

netlify bot commented Feb 23, 2026

Deploy Preview for fixmybharat canceled.

Name Link
🔨 Latest commit 6484ca7
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/699c3cb30167d100087b8ac8

@github-actions
Copy link

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Introduces CORS origin handling with a module-level allowed_origins list and environment-dependent behavior; adds previous_integrity_hash to the Issue model and updates issue creation and verification to store and prefer the prior hash for chained integrity checks.

Changes

Cohort / File(s) Summary
CORS Configuration
backend/main.py
Adds module-level allowed_origins and conditional CORS logic: in production a missing/invalid FRONTEND_URL disables all origins; in development defaults (e.g., http://localhost:5173) are appended with relaxed validation and warning logs.
Issue Model & Integrity Chain
backend/models.py, backend/routers/issues.py
Adds nullable previous_integrity_hash column to Issue. New issue creation persists the previous hash; verification now prefers stored previous_integrity_hash (O(1)) with a fallback to legacy prior-issue lookup.
Dependencies (render)
backend/requirements-render.txt
Adds voice and language support packages (SpeechRecognition, pydub, googletrans==4.0.2, langdetect, indic-nlp-library) under a # Voice and Language Support comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #357 — Modifies backend/models.py and backend/routers/issues.py with similar previous_integrity_hash usage and integrity-chain logic.
  • #323 — Changes FRONTEND_URL/CORS handling in backend/main.py, related to the new origin logic.
  • #348 — Introduces or alters issue integrity-chain implementation that this PR extends.

Suggested labels

size/m

Poem

🐰 I hopped through code with twitchy paws,
I tucked a hash into the cause,
CORS doors now mind the env's decree,
Chains linked tight — secure as can be,
Hooray, carrot-toast for dev and prod! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main changes: blockchain verification optimization (O(1) improvement) and startup robustness (handling missing FRONTEND_URL).
Description check ✅ Passed The PR description comprehensively covers changes across four modified files, explains the rationale behind each change, and clearly maps PR objectives to implementation details. It follows the template structure with adequate detail on what, why, and how.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-blockchain-integrity-robustness-7179034731236619929

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes blockchain verification performance and improves application startup robustness by introducing a stored previous_integrity_hash field and refactoring CORS configuration handling.

Changes:

  • Added previous_integrity_hash column to the Issue model to enable O(1) blockchain verification instead of O(N) database queries
  • Refactored CORS configuration in main.py to fail securely when FRONTEND_URL is missing in production, logging an error instead of crashing the application
  • Updated blockchain verification logic to use stored previous hash with backward-compatible fallback for legacy records

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
backend/models.py Added previous_integrity_hash column to Issue model for storing reference to previous block in the blockchain
backend/routers/issues.py Updated create_issue to store previous hash and refactored verify_blockchain_integrity to use O(1) lookup with legacy fallback
backend/main.py Refactored CORS configuration to handle missing/invalid FRONTEND_URL gracefully without crashing in production

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if frontend_url not in allowed_origins and "*" not in allowed_origins:
allowed_origins.append(frontend_url)
elif frontend_url and "*" not in allowed_origins:
logger.warning(f"Invalid FRONTEND_URL format: {frontend_url}. Ignored for CORS.")
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an inconsistent indentation/whitespace here. The line has a leading space before "logger.warning" which breaks the indentation pattern. This should use consistent indentation (4 spaces) like the other branches.

Suggested change
logger.warning(f"Invalid FRONTEND_URL format: {frontend_url}. Ignored for CORS.")
logger.warning(f"Invalid FRONTEND_URL format: {frontend_url}. Ignored for CORS.")

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +200
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing blockchain tests need to be updated to properly test the new previous_integrity_hash field. Currently, test_blockchain_verification_success creates Issue records without setting previous_integrity_hash, which only tests the legacy fallback path. Add test cases that:

  1. Create Issues with previous_integrity_hash set to verify the O(1) path works correctly
  2. Test the legacy fallback behavior explicitly for backward compatibility
  3. Verify that both paths produce the same validation results

This ensures both the new optimized path and legacy fallback are properly tested.

Copilot uses AI. Check for mistakes.
Comment on lines +635 to +645
# Determine previous hash:
# 1. Use explicitly stored previous_integrity_hash (O(1), reliable)
# 2. Fallback to searching by ID (Legacy method, O(log N) or worse, race-prone)
if current_issue.previous_integrity_hash is not None:
prev_hash = current_issue.previous_integrity_hash
else:
# Fetch previous issue's integrity hash to verify the chain (Legacy fallback)
prev_issue_hash = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions fixing a "race-condition-prone O(N) lookup", but the actual improvement here is primarily a performance optimization (O(1) vs O(N) lookup). The race condition in the old verification code was minimal - it could only occur if an issue was deleted between reading the current issue and fetching the previous issue's hash. The new code eliminates this edge case, but the primary benefit is performance, not race condition prevention.

Note that a race condition still exists during issue creation: two concurrent creates can both fetch the same prev_hash, creating a fork in the blockchain chain. This PR doesn't address that race condition. Consider clarifying the PR description to emphasize the performance benefits over the race condition fix.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/routers/issues.py (1)

168-201: ⚠️ Potential issue | 🟠 Major

Race condition during concurrent issue creation breaks blockchain chain integrity.

The fetch-then-insert between lines 173–204 is not atomic. Two concurrent create_issue calls can both read the same prev_hash, causing both issues to store the same previous_integrity_hash. This forks the blockchain chain into a tree where two issues share the same previous hash, though each issue still verifies individually.

This is a new issue introduced by the blockchain feature (not pre-existing). While each report's hash can still be verified against its stored previous hash, the linear chain property is violated, defeating the blockchain's audit trail design.

Suggested fixes:

  • Add a UNIQUE constraint on previous_integrity_hash to force retry on conflict
  • Wrap the fetch+insert in a serialized transaction or advisory lock to ensure atomicity
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/issues.py` around lines 168 - 201, The current
fetch-then-insert that computes prev_hash and writes previous_integrity_hash on
Issue (see the run_in_threadpool query for Issue.integrity_hash and the
Issue(...) construction) is racy under concurrent create_issue calls; make this
atomic by acquiring a DB-level lock or using a serialized transaction around the
read-and-insert (or use a DB advisory lock keyed to a constant) and add a UNIQUE
constraint on Issue.previous_integrity_hash so conflicting concurrent inserts
fail and can be retried; implement conflict handling (catch unique constraint
violation on previous_integrity_hash and retry the read+compute+insert loop a
bounded number of times) so the chaining remains linear and safe.
🧹 Nitpick comments (3)
backend/main.py (2)

131-152: CORS configuration logic looks correct and secure.

The fail-secure behavior (empty allowed_origins in production when FRONTEND_URL is missing) is a good approach. Two minor observations:

  1. Log level: The PR description states this "logs a critical error," but Line 135 uses logger.error. For a production misconfiguration that silently blocks all browser traffic, logger.critical is more appropriate and easier to alert on.

  2. Redundant "*" not in allowed_origins guards (lines 149, 151, 154): No code path ever adds "*" to allowed_origins, so these checks are dead. They're harmless as defensive coding, but worth noting.

💡 Consider using logger.critical for production misconfiguration
     if is_production:
-        logger.error(
+        logger.critical(
             "FRONTEND_URL environment variable is MISSING in production! "
             "CORS will reject all requests from browsers. Please configure FRONTEND_URL."
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 131 - 152, Change the production
misconfiguration log from logger.error to logger.critical to surface alerts for
missing FRONTEND_URL (update the call around the block that sets allowed_origins
= [] and frontend_url = ""), and remove the redundant '"*" not in
allowed_origins' guards since nothing adds "*" to allowed_origins (adjust the
conditional checks around the places that append frontend_url to allowed_origins
and the invalid-format warning that currently include that guard). Ensure
references to allowed_origins, frontend_url, and logger are updated
consistently.

129-129: Duplicated production-environment check.

This duplicates the logic from backend/config.py's is_production() method. If the production detection logic changes (e.g., additional environment names), this inline check could drift out of sync.

Consider importing and reusing the config's is_production() if it's available at module-level, or extracting the environment check into a shared constant/utility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` at line 129, The inline production check in backend/main.py
duplicates backend/config.py's is_production() logic; replace the local
is_production = os.environ... assignment by importing and calling the existing
is_production() function (or referencing a shared constant) from backend.config
so both modules share the same source of truth (look for the is_production()
function in config.py and update main.py to use that symbol instead of
re-evaluating os.environ).
backend/routers/issues.py (1)

638-645: Subtle: None check is correct but consider explicit empty-string handling for clarity.

The is not None check on line 638 is correct because creation stores "" for the genesis issue and None only exists on pre-migration records. However, a brief inline comment explaining this distinction (empty string = genesis block vs. None = legacy record) would help future maintainers understand the semantics.

📝 Suggested clarifying comment
-    if current_issue.previous_integrity_hash is not None:
+    # None → legacy record (pre-migration), use fallback lookup
+    # "" → genesis block (no predecessor), use as-is
+    if current_issue.previous_integrity_hash is not None:
         prev_hash = current_issue.previous_integrity_hash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/issues.py` around lines 638 - 645, Add a short inline comment
explaining the semantics behind current_issue.previous_integrity_hash checks:
explicitly state that an empty string ("") represents the genesis issue while
None indicates a pre-migration legacy record; keep the existing conditional
logic (if current_issue.previous_integrity_hash is not None) and ensure
prev_hash assignment and the legacy fallback using
run_in_threadpool/db.query(Issue.integrity_hash)...first() remain unchanged so
behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@backend/routers/issues.py`:
- Around line 168-201: The current fetch-then-insert that computes prev_hash and
writes previous_integrity_hash on Issue (see the run_in_threadpool query for
Issue.integrity_hash and the Issue(...) construction) is racy under concurrent
create_issue calls; make this atomic by acquiring a DB-level lock or using a
serialized transaction around the read-and-insert (or use a DB advisory lock
keyed to a constant) and add a UNIQUE constraint on
Issue.previous_integrity_hash so conflicting concurrent inserts fail and can be
retried; implement conflict handling (catch unique constraint violation on
previous_integrity_hash and retry the read+compute+insert loop a bounded number
of times) so the chaining remains linear and safe.

---

Nitpick comments:
In `@backend/main.py`:
- Around line 131-152: Change the production misconfiguration log from
logger.error to logger.critical to surface alerts for missing FRONTEND_URL
(update the call around the block that sets allowed_origins = [] and
frontend_url = ""), and remove the redundant '"*" not in allowed_origins' guards
since nothing adds "*" to allowed_origins (adjust the conditional checks around
the places that append frontend_url to allowed_origins and the invalid-format
warning that currently include that guard). Ensure references to
allowed_origins, frontend_url, and logger are updated consistently.
- Line 129: The inline production check in backend/main.py duplicates
backend/config.py's is_production() logic; replace the local is_production =
os.environ... assignment by importing and calling the existing is_production()
function (or referencing a shared constant) from backend.config so both modules
share the same source of truth (look for the is_production() function in
config.py and update main.py to use that symbol instead of re-evaluating
os.environ).

In `@backend/routers/issues.py`:
- Around line 638-645: Add a short inline comment explaining the semantics
behind current_issue.previous_integrity_hash checks: explicitly state that an
empty string ("") represents the genesis issue while None indicates a
pre-migration legacy record; keep the existing conditional logic (if
current_issue.previous_integrity_hash is not None) and ensure prev_hash
assignment and the legacy fallback using
run_in_threadpool/db.query(Issue.integrity_hash)...first() remain unchanged so
behavior is preserved.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8f93f and ac4b67d.

📒 Files selected for processing (3)
  • backend/main.py
  • backend/models.py
  • backend/routers/issues.py

- Added `SpeechRecognition`, `pydub`, `googletrans`, `langdetect`, `indic-nlp-library` to `backend/requirements-render.txt`.
- These are required by `backend/voice_service.py` which is imported on startup, preventing `ModuleNotFoundError` crashes on Render deployment.
- Also includes security fix for `backend/main.py` regarding `FRONTEND_URL` validation in production.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/requirements-render.txt (1)

20-20: googletrans==4.0.2 is maintained but uses async-only API; assess maintenance stability for production.

Three key concerns:

  1. Inherent instability: Due to limitations of the web version of Google Translate, this API does not guarantee proper operation at all times. HTTP 5xx errors can occur if Google bans the client IP address — a realistic scenario for a server-side deployment on Render whose outbound IP may get rate-limited or blocked.

  2. Async-only API: v4.0.0 transitioned to a modern asynchronous API with all synchronous methods removed. The codebase properly handles this pattern using asyncio.run() in voice_service.py's translate_text() method, so no immediate compatibility issue exists. However, any future synchronous call sites must follow this async-wrapping pattern or upgrade code architecture.

  3. Stalled maintenance: v4.0.2 (released Jan 1, 2025) is the latest PyPI release. As of Feb 23, 2026, no new versions have shipped in 13+ months, suggesting the project has entered maintenance dormancy again despite the v4 revival.

For production reliability, consider Google's official Cloud Translation API (google-cloud-translate) or deep-translator as actively maintained alternatives.

The httpx dependency is compatible: googletrans 4.0.2 requires httpx[http2] >= 0.27.2, not the problematic v0.13.3 constraint from earlier RC versions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/requirements-render.txt` at line 20, The dependency
googletrans==4.0.2 in requirements-render.txt is async-only and potentially
unstable for production; review the translate_text() implementation in
voice_service.py (which currently uses asyncio.run()) and either (A) replace
googletrans with a maintained alternative (google-cloud-translate or
deep-translator) and update requirements accordingly, or (B) add robust
retry/backoff and IP-rate-limit handling around calls to googletrans and
document the async requirement so call sites use asyncio.run()/await; ensure
tests and CI reflect the chosen library and update any import/signature changes
in translate_text() and related call sites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/requirements-render.txt`:
- Line 19: The requirements file adds pydub but pydub needs the ffmpeg system
binary; update the Render deployment to provision ffmpeg by adding it to your
service's packages in render.yaml (e.g., under the service block for your
backend add packages: - ffmpeg) or ensure your buildCommand installs ffmpeg
before pip installs (e.g., apt-get install -y ffmpeg), so pydub won’t fail at
runtime; refer to requirements-render.txt (pydub) when making the change.
- Line 22: Remove the unused dependency entry "indic-nlp-library" from
backend/requirements-render.txt (and also from backend/requirements.txt if
present), update any dependency lock or compiled requirements files (e.g.,
re-run pip-compile or regenerate requirements*.txt), and run the project's test
suite or a quick import scan to ensure nothing in the codebase references
indic-nlp-library before committing the change.

---

Nitpick comments:
In `@backend/requirements-render.txt`:
- Line 20: The dependency googletrans==4.0.2 in requirements-render.txt is
async-only and potentially unstable for production; review the translate_text()
implementation in voice_service.py (which currently uses asyncio.run()) and
either (A) replace googletrans with a maintained alternative
(google-cloud-translate or deep-translator) and update requirements accordingly,
or (B) add robust retry/backoff and IP-rate-limit handling around calls to
googletrans and document the async requirement so call sites use
asyncio.run()/await; ensure tests and CI reflect the chosen library and update
any import/signature changes in translate_text() and related call sites.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac4b67d and 2cae3b2.

📒 Files selected for processing (1)
  • backend/requirements-render.txt

passlib[bcrypt]
# Voice and Language Support
SpeechRecognition
pydub
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

pydub ffmpeg system dependency requirement

💡 Result:

pydub has a system dependency on an external encoder/decoder:

  • Required for non-WAV formats (mp3, m4a, ogg, etc.): install FFmpeg or libav (pydub calls these binaries via your OS). Without it, pydub can only read/write WAV in pure Python. [1]
  • Must be discoverable on your system: the ffmpeg (and commonly ffprobe) executables need to be on your PATH, or you must point pydub to them (e.g., by setting AudioSegment.converter / AudioSegment.ffprobe). If ffprobe isn’t found, some operations can fail even if ffmpeg is installed. [1][2]
  • Typical install examples from pydub docs:
    • macOS (Homebrew): brew install ffmpeg (or brew install libav) [1]
    • Linux (apt): apt-get install ffmpeg (or apt-get install libav-tools ...) [1]
    • Windows: install/extract FFmpeg/libav binaries and add the bin folder to PATH. [1]

🏁 Script executed:

# Search for Render configuration files
fd -i render . --type f
fd -i docker . --type f | head -20

Repository: RohanExploit/VishwaGuru

Length of output: 201


🏁 Script executed:

# Search for ffmpeg mentions in config files
rg -i "ffmpeg" --type yaml --type toml --type json

Repository: RohanExploit/VishwaGuru

Length of output: 49


🏁 Script executed:

# Examine the requirements-render.txt file to see context
cat -n backend/requirements-render.txt

Repository: RohanExploit/VishwaGuru

Length of output: 543


🏁 Script executed:

# Check render.yaml for system package configuration
cat -n render.yaml

Repository: RohanExploit/VishwaGuru

Length of output: 1996


🏁 Script executed:

# Check the render-build.sh script
cat -n render-build.sh

Repository: RohanExploit/VishwaGuru

Length of output: 613


🏁 Script executed:

# Check RENDER_DEPLOYMENT.md for any ffmpeg setup notes
cat -n RENDER_DEPLOYMENT.md

Repository: RohanExploit/VishwaGuru

Length of output: 2310


pydub requires ffmpeg as a system dependency — must be provisioned on Render separately.

pydub delegates all audio decoding/encoding to ffmpeg (or libav). Installing the Python package alone is insufficient; without ffmpeg present in the OS environment, any pydub audio operation will raise a FileNotFoundError or RuntimeError at runtime. Render does not pre-install ffmpeg by default. Add it as a system package in render.yaml:

services:
  - type: web
    name: vishwaguru-backend
    packages:
      - ffmpeg

Or declare it via a build command that installs it before the Python dependencies are installed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/requirements-render.txt` at line 19, The requirements file adds pydub
but pydub needs the ffmpeg system binary; update the Render deployment to
provision ffmpeg by adding it to your service's packages in render.yaml (e.g.,
under the service block for your backend add packages: - ffmpeg) or ensure your
buildCommand installs ffmpeg before pip installs (e.g., apt-get install -y
ffmpeg), so pydub won’t fail at runtime; refer to requirements-render.txt
(pydub) when making the change.

- Removed `indic-nlp-library` from `backend/requirements-render.txt`.
- This library was pulling in `pandas` and `numpy`, significantly increasing the slug size and likely causing deployment failure on Render.
- Analysis confirmed it is not used in the codebase.
- Retained `SpeechRecognition`, `pydub`, `googletrans`, and `langdetect` which are required for voice features.
@github-actions
Copy link

🔍 Quality Reminder

Thanks for the updates! Please ensure:
- Your changes don't break existing functionality
- All tests still pass
- Code quality standards are maintained

*The maintainers will verify that the overall project flow remains intact.*

- Removed `indic-nlp-library` from `backend/requirements-render.txt`. This library pulls heavy dependencies (`pandas`, `numpy`) that caused the Render slug size to exceed limits, leading to deployment failure.
- Verified `indic-nlp-library` is unused in the codebase.
- Retained essential voice dependencies (`SpeechRecognition`, `pydub`, `googletrans`, `langdetect`) to fix the `ModuleNotFoundError`.
- Hardened `backend/main.py` to securely handle missing `FRONTEND_URL` in production (deny all CORS instead of crashing or allowing wildcard).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants