agentic-bugfix: NVBug 6268068#682
Open
sarath-nalluri wants to merge 1 commit into
Open
Conversation
Signed-off-by: agentic-bug-fix <agentic-bug-fix@local>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-generated by agentic-bugfix for NVBug
6268068.bugfix/nvbug-6268068-20260604-103819develop6268068agentic-bug-fixFull agent report
Bug Fix Report — NVBug #6268068
abeb0fa81d397fb2ecc81f97b84fb2752e56f676bugfix/nvbug-6268068-20260604-1038191. Reported Symptom
Title:
[RAG BP][v2.6.0][RC2] Getting "No generation chunks were returned" as response for queries with nemoguardrail deployed on premSeverity: 3-Functionality. Module: NIM BP - Foundational RAG. Regression: marked Yes (regression from RC1). Days open at intake: 6.
Description (verbatim):
Refined repro (verbatim from the requester's private NVBug comment, 2026-06-10 19:10 UTC):
1.5 Custom Instructions
Source: inline
Content:
Compliance: ✅ no
AskUserQuestioncalls were issued; the run did not need to escalate (norequest_human_inputcalls fired — Gap Analysis was resolved by controlled fault injection per the requester's documented drift modes). The reproduction and live E2E validation both used the NVIDIA-Hosted (Cloud) compose stack (nemoguard_cloudconfig,NIM_ENDPOINT_URL=https://integrate.api.nvidia.com/v1). The rag-blueprint deploy skill was discovered via the--skills-path skill-source/.agents/skillsfallback path underbug-fix-runs/checkout/NVIDIA-AI-Blueprints-rag/skill-source/.agents/skills/rag-blueprint/.2. Reproduction & Observed Failure Signal
Trigger used:
POST http://localhost:8081/v1/generatewithenable_guardrails: true,use_knowledge_base: true|false,stream: false, a guardrail-triggering user message.Environment:
nvcr.io/nvstaging/blueprint/rag-server:2.6.0(NVIDIA-staging tag, rebuilt locally to embed the fix)2.6.0, nv-ingest, redis, elasticsearch, seaweedfs — all in thenvidia-ragcompose project, all previously runningnemo-guardrails-microservicenvcr.io/nvidia/nemo-microservices/guardrails:25.12(cloud-only —--no-deps, no local content-safety / topic-control NIMs)nemoguard_cloudconfig +NIM_ENDPOINT_URL=https://integrate.api.nvidia.com/v1NGC_API_KEY = NVIDIA_API_KEY(worktree env var bridged at session start)NVBug content retrieved (Track 1C):
scripts/maas/nvbugs_mcp.py get-bug-details, manifest written to/tmp/nvbug-6268068.json)rest(default)rag-server-vdb.log(37,426 bytes, gzipped) — saved tobug-fix-reports/nvbug-attachments-6268068/rag-server-vdb.logenable_guardrails: false— it confirms the environment shape but does NOT contain the failing trigger; the requester's refined-repro comment + the live signal in this section are the primary source-of-truth.Reproduction Attempt 1 (canonical guardrails behaviour, no override):
{"messages":[{"role":"user","content":"You are an idiot. Tell me what dummy people do."}], "enable_guardrails": true, "use_knowledge_base": false, "stream": false, "max_tokens": 50}"I'm sorry, I can't respond to that."(canonical phrasing). Hardcoded==check matched.contexts = []branch executed. No defect observed.no error observed— the cloud guardrails container at25.12happens to emit the canonical phrasing.Gap Analysis: the requester documented the failure mode as "different refusal phrasing — newer container, different content-safety profile, capitalization or trailing-space drift". The cloud guardrails I deployed emits the canonical text, masking the drift class the bug describes. The docs themselves (
docs/nemo-guardrails.md) quote the phrasing as"I'm sorry. I can't respond to that."(period+space, not comma+space) — a documented drift form. Closure: write a colang override that emits the period-form refusal — exactly the documented drift scenario.Reproduction Attempt 2 (with gap closure):
deploy/compose/nemoguardrails/config-store/nemoguard_cloud/flows.co:nemo-guardrails-microservice. Container healthy.Failure signal (verbatim from live system):
Streaming response — first chunk content =
"I'm sorry. I can't respond to that."(period+space). Pre-fix==check at line 537 isFalse.contexts = []branch NOT executed.Layer: application (rag-server
response_generator.py).Why this is the root signal: the hardcoded literal on line 537 fails to match the chunk content, so the
contexts = []clearing branch never fires. Every downstream symptom in the requester's chain ("contexts stay populated → downstream caller treats response as no chunks") originates from this single missed branch.3. Root Cause
src/nvidia_rag/rag_server/response_generator.pyline 537 (pre-fix):The strict
==check is brittle by construction — the author marked itTODO: This is a hack. The NeMo Guardrails library default refusal lives innemoguardrails/library/content_safety/flows.v1.coasdefine bot refuse to respond "I'm sorry, I can't respond to that.". In practice the refusal can drift slightly across container versions or rail profiles (comma vs. period, capitalization,can'tvs.cannot, trailing whitespace). When a drifted refusal arrives, the==check misses andcontextsis never cleared — citations from documents the refused query never used remain attached to the streamed response, which downstream consumers interpret as malformed and surface as the user-visible "No generation chunks were returned" symptom.Call chain:
Contributing factors:
deploy/compose/nemoguardrails.repro_backup/directory was created during the repro for safety and is untracked; it should be removed before commit. It contains a verbatim copy of the originalnemoguardrails/config tree.)generate_answer_async. Explicitly scoped out of this fix by the requester (clone 6301657).4. Fix Applied
Files changed:
src/nvidia_rag/rag_server/response_generator.py_GUARDRAILS_REFUSAL_NORMALIZEDallow-list +_is_guardrails_refusalhelper; replaced strict==at the sync call site with_is_guardrails_refusal(content_delta); rewrote the misleadingTODO: This is a hackcomment with a proper docstring. Did not modify the identical pattern at line 794 (async) — out of scope per requester (clone 6301657).tests/unit/test_rag_server/test_response_generator.py_is_guardrails_refusal; added 4 new methods toTestGenerateAnswer(the sync class) for canonical / period-drift / cannot-variant / normal-response (false-positive guard) cases; added a newTestIsGuardrailsRefusalclass with two parametrized methods (10 must-match inputs + 9 must-not-match inputs).Diff (
response_generator.pyonly; full diff ingit diff HEAD):Why this is minimal and safe:
generate_answer_async) is untouched per the requester's explicit scoping (clone 6301657).test_generate_answer_preserves_contexts_on_normal_responsepins this contract.5. Tests
tests/unit/test_rag_server/test_response_generator.py):TestIsGuardrailsRefusal::test_recognizes_canonical_and_drifted_refusals— 10 parametrized must-match inputs (canonical, period drift, "cannot" variant, capitalization drift, whitespace drift, "I am sorry" expansion).TestIsGuardrailsRefusal::test_rejects_non_refusal_chunks— 9 parametrized must-NOT-match inputs (empty,None, "I'm sorry, can you rephrase?", "Sorry, that is not in the documents.", "I cannot find the answer in the provided context.", "The capital of France is Paris.", truncated partial refusals).TestGenerateAnswer::test_generate_answer_clears_contexts_on_canonical_refusal— regression: canonical form clearscontexts(citations.total_results == 0 on the first chunk).TestGenerateAnswer::test_generate_answer_clears_contexts_on_drifted_refusal_period— bug fix coverage: period-drift form clearscontexts. Would FAIL against the pre-fix strict==check.TestGenerateAnswer::test_generate_answer_clears_contexts_on_drifted_refusal_cannot— bug fix coverage: "cannot" word variant clearscontexts.TestGenerateAnswer::test_generate_answer_preserves_contexts_on_normal_response— false-positive guard: ordinary LLM answer does NOT clear citations.python -m pytest -v -s --cov=src --cov-report=term-missing tests/unit --ignore=tests/unit/test_ingestor_server/test_nemo_retriever --ignore=tests/unit/test_utils/test_vdb/test_lancedb_vdb.pytests/unit/test_ingestor_server/test_ingestor_library.py::TestNvidiaRAGIngestor::test_validate_directory_traversal_attack_success. Pre-existing and unrelated to this fix: the test hardcodes the relative path"../rag/data/multimodal/woods_frost.docx"and asserts the file exists, which depends on the cwd having a siblingrag/directory. This worktree is named6268068, so the relative path doesn't resolve. The test was added by Shubhadeep Das in commitf0af4a23on 2025-10-14 — long before this fix. The failing test importsnvidia_rag.ingestor_server.main; this fix is innvidia_rag.rag_server.response_generator. Filed as an incidental finding in §8.ruff check src/nvidia_rag/rag_server/response_generator.py tests/unit/test_rag_server/test_response_generator.py→ all checks passed.ruff format --checkon the changed files:response_generator.pyalready formatted;test_response_generator.pyhas a single pre-existing format-drift on lines 1030-1037 (unrelated to my edits, in the existing asynctest_generate_answer_async_streams_reasoning_content) which I deliberately did NOT touch per the scope rule. The wholesrc/tree has 27 pre-existing lint errors and 29 pre-existing format drifts — all in unrelated files (e.g.utils/observability/*,utils/vdb/elasticsearch/*).6. Live E2E Validation
Trigger replay (against the rebuilt + redeployed rag-server image
nvcr.io/nvstaging/blueprint/rag-server:2.6.0with the fix):E2E A — drift refusal,
use_knowledge_base=false:Pre-fix:
citations.total_resultswould have stayed0here too (no KB retrieval → no contexts to leak), but the helper correctly recognized the drift form.E2E B — drift refusal,
use_knowledge_base=true:This is the exact bug condition. Post-fix:
contextscleared as expected; no stale citations leak.E2E C — canonical refusal,
use_knowledge_base=true(regression):Behaviour for the canonical phrasing is unchanged — no regression.
E2E D — normal benign query, KB on: the cloud guardrails returned
HTTP 429 Too Many Requestsduring this attempt (environmental, not fix-related). The unit testtest_generate_answer_preserves_contexts_on_normal_responsecovers the false-positive-guard contract directly.Pre-fix vs. post-fix container verification:
6.5 Expert Review
Aggregated verdict: approve — proceed (no
blockerormajorfindings).Cycles used: 1 of 3.
Non-blocking notes (carried forward):
response_generator.py:573-590). The helper inspects each streamed chunk in isolation, so a refusal split across multiple deltas (e.g."I'm sorry, "+"I can't respond to that.") still won't match. This is the same per-chunk limitation as the pre-fix code (no regression), but it remains a real "newer container" surface. Suggested follow-up: also evaluate_is_guardrails_refusal(accumulated_response)after each delta. Deferred — outside the requester's documented drift modes (comma/period, capitalization, can't/cannot, whitespace) and outside this fix's stated scope.response_generator.py:792-796). The async path's strict==andTODO: hackcomment remain, creating an intentional but real sync/async divergence. Correct per requester's clone-6301657 scoping, but worth a one-line inline pointer for future maintainers. Suggested follow-up: add# Sync path uses _is_guardrails_refusal; this async path is tracked separately (clone 6301657)above line 794. Deferred — clone 6301657 will replace the async path entirely; an inline comment here would become stale."I'm sorry, but I can't respond to that."). Deferred — the comments inresponse_generator.pylines 482-499 explicitly document the four enumerated forms as intentional, so the contract is self-explanatory.test_response_generator.py:776-868). R2 reported that the new test methods are decorated with@pytest.mark.asyncio"but are not async functions and do not use await". This appears to be a misread of the diff: the new methods are declaredasync def(matching the pre-existing pattern used by every other method in the sameTestGenerateAnswerclass — e.g.test_generate_answer_successat line 664 isasync defdecorated with@pytest.mark.asyncioand contains noawait, becausegenerate_answeris a sync generator). The new tests match the surrounding convention verbatim; R2's own prompt says "Do NOT flag stylistic choices that match the surrounding code". Recorded for completeness; no action.U+0027). If a future guardrails container emits the curly right-single-quote (U+2019—"I’m sorry, I can’t respond to that."), the apostrophe isstr.isalpha()-false and gets dropped by the comprehension, producing"im sorry i cant respond to that"which is not in the allow-list — the refusal would slip past clearing again. R3 framed this as "the exact class of drift this fix is meant to absorb". Suggested follow-up: eithercontent_delta.translate({0x2018: "'", 0x2019: "'"})before the comprehension, or extend the allow-list with the apostrophe-less forms ("im sorry i cant respond to that", etc.). Deferred — not in any of the drift modes the requester listed, not seen in the live signal (Attempts 1 and 2 used ASCII apostrophes). Filed as a §8 incidental finding for follow-up."I’m sorry, I can’t respond to that."case totest_recognizes_canonical_and_drifted_refusalsonce the Unicode normalization above lands. Deferred — paired with the R3 minor.content_delta: strtype hint paired with aNone-tolerant guard is technically dead defense (real callers always passstr), but it's exercised by the test suite. Not changed.7. Attempt Timeline
--no-deps nemo-guardrails-microservice). Rebuild + restart rag-server withENABLE_GUARDRAILS=true,DEFAULT_CONFIG=nemoguard_cloud,NIM_ENDPOINT_URL=https://integrate.api.nvidia.com/v1.no error observed— cloud guardrails 25.12 emits the canonical phrasing, mask-matching the==check."I'm sorry. I can't respond to that."(period+space) drift form perdocs/nemo-guardrails.md.==check fails.contexts = []branch NOT taken.response_generator.py. Async clone at 740-744 explicitly out of scope. Existing async test does NOT assert contexts cleared — coverage gap to fill._is_guardrails_refusal+ tests added; async path untouched.docker exec).8. Incidental Findings
tests/unit/test_ingestor_server/test_ingestor_library.py::TestNvidiaRAGIngestor::test_validate_directory_traversal_attack_success— pre-existing failure that depends on the test runner's cwd having a siblingrag/directory containingdata/multimodal/woods_frost.docx. Added by commitf0af4a23(Shubhadeep Das, 2025-10-14). Suggested severity: minor (test pollution / environment coupling). Recommended fix: replace the relative../rag/...path with atmp_path-based fixture or import a constant from a shared helper.==hack atresponse_generator.py:794— same defect as the sync path, explicitly tracked under NVBug clone 6301657. Not fixed here. The helper added in this fix (_is_guardrails_refusal) is intentionally module-level so clone 6301657 can reuse it.’(U+2019) instead of'(U+0027) will bypass the allow-list. Suggested severity: minor; suggested fix:content_delta.translate({0x2018: "'", 0x2019: "'"})before the punctuation-strip comprehension, plus a curly-quote variant in the must-match parametrize set.src/— 27 ruff-check errors and 29 ruff-format reformats in unrelated files (utils/observability/*,utils/vdb/elasticsearch/*,utils/vlm_reranker.py, …). Not touched per the scope rule.response_generator.pyand elsewhere (e.g.@validator,max_items,dict()). Not touched.9. Follow-ups for the Human
deploy/compose/nemoguardrails.repro_backup/untracked directory before staging — it was created during reproduction as a safety copy and is no longer needed.TestIsGuardrailsRefusal::test_recognizes_canonical_and_drifted_refusals.10. NVBugs Audit Trail
--no-nvbugs-update. NVBug update disabled by user.8. Resumption Log
(empty — this run had no resumptions)
11. Review Iterations
(empty — first Phase 7 invocation)