Conversation
- Implemented a pre-deduplication step in the MultiStageReviewOrchestrator to eliminate near-duplicate issues before sending to the LLM. - Added a new static method `_deduplicate_previous_issues` to handle deduplication based on location fingerprint and semantic similarity. - Updated logging to reflect the number of duplicates removed during the reconciliation process. fix: Allow optional AI base URL in LLM factory - Modified the LLMFactory to accept an optional `ai_base_url` parameter when creating an LLM instance. feat: Introduce dependency-aware batching for changed files - Added a new function `build_dependency_aware_batches` to create batches of changed files while considering their relationships. - Implemented logic to estimate token costs based on file content and diff sizes for efficient batching. chore: Update QA documentation prompts for clarity and structure - Revised multiple QA documentation prompts to improve clarity and enforce strict formatting rules. - Added new sections for task context and acceptance criteria in the documentation prompts. - Enhanced instructions for generating QA documentation to ensure user-facing language is used. new: Create Task Context Builder utility - Introduced a new utility `task_context_builder.py` to extract structured context from Jira/task-management data. - Implemented functions to parse acceptance criteria and build a rich context block for LLM prompts.
|
🔄 CodeCrow is analyzing this PR... This may take a few minutes depending on the size of the changes. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded OpenAI‑compatible endpoint support (aiBaseUrl) across frontend, Java, and Python stacks; introduced SSRF-safe URL validation; refactored QA auto-documentation into a multi-stage orchestrator with server-side QaDocState and enrichment/delta-diff flows; added shared VCS diff utilities and branch-scoped issue deduplication. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WebServer as Web Server
participant QaCmd as QaDocCommandProcessor
participant VCS as VCS API
participant Enr as Enrichment Service
participant QaState as QaDocState Repo
participant Orchestrator as QaDocOrchestrator
participant LLM as LLM Factory/Client
participant Jira as Jira API
User->>WebServer: PR webhook -> trigger QA doc
WebServer->>QaCmd: build and validate request
QaCmd->>VCS: fetch PR diff & changed files
QaCmd->>Enr: fetch enrichment for files
QaCmd->>QaState: load existing QaDocState(project, task)
QaCmd->>VCS: fetch delta-diff (lastCommit->current)
QaCmd->>Orchestrator: run(QaDocGenerationContext with diff, delta, enrichment, creds)
Orchestrator->>LLM: Stage1 batch calls (per-file)
Orchestrator->>LLM: Stage2 cross-impact
Orchestrator->>LLM: Stage3 aggregation/delta
Orchestrator-->>QaCmd: documentation result
QaCmd->>Jira: post/update comment
QaCmd->>QaState: upsert state(commit, analysisId, prNumber)
QaCmd-->>WebServer: respond success
sequenceDiagram
participant Client
participant API as LlmModelController
participant Validator as SsrfSafeUrlValidator
participant DNS as DNS Resolver
participant DB as Database
Client->>API: POST /api/llm-models/fetch-custom { baseUrl, apiKey? }
API->>Validator: validate(baseUrl)
Validator->>DNS: resolve(hostname)
DNS-->>Validator: IPs
Validator->>Validator: check IPs not private/reserved
alt valid
Validator-->>API: ok
API->>DB: (no persistence here) call external baseUrl/v1/models and return models
else invalid
Validator-->>API: throws IllegalArgumentException
API-->>Client: 400 Bad Request (validator message)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java (1)
149-158:⚠️ Potential issue | 🟡 MinorUpdate location cache in the duplicate-save fallback path.
If
saveAndFlushfails on a duplicate, onlybranchContentFingerprintsis updated. In the same run, a title-variant issue at the same location can still pass Tier 2.5 and be reprocessed.💡 Proposed patch
} catch (DataIntegrityViolationException e) { // Safety net: concurrent insert or edge-case fingerprint collision log.warn("Duplicate content_fingerprint for branch {} file {} — skipping (fp={})", branch.getId(), filePath, bi.getContentFingerprint() != null ? bi.getContentFingerprint().substring(0, 12) + "..." : "null"); skipped++; if (bi.getContentFingerprint() != null) { branchContentFingerprints.add(bi.getContentFingerprint()); } + if (bi.getLineHash() != null) { + String locFp = bi.getFilePath() + ":" + bi.getLineHash() + ":" + + (bi.getIssueCategory() != null ? bi.getIssueCategory().name() : "UNKNOWN"); + branchLocationFingerprints.add(locFp); + } continue; }Also applies to: 166-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java` around lines 149 - 158, When catching DataIntegrityViolationException in BranchIssueMappingService (the saveAndFlush duplicate-save fallback), also update the in-memory location cache used to track saved issue locations in addition to adding to branchContentFingerprints; locate the catch block handling DataIntegrityViolationException around saveAndFlush and ensure the same cache update (the set/map that prevents reprocessing of a file/line location — e.g., the branchLocationCache/processedLocations structure used elsewhere in the method) is updated there; apply the identical fix to the second duplicate-save catch path (the other DataIntegrityViolationException block around lines 166-170).java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java (1)
50-63:⚠️ Potential issue | 🟠 MajorPersist the normalized base URL, not the raw request string.
validateBaseUrl()trims and strips the URL at Lines 121-125, but Lines 61-63 and 93-94 still saverequest.baseUrlunchanged. A value like" https://host/ "will pass validation and then be persisted with the extra whitespace/slash, which breaks the normalization guarantee and can later fail outbound calls.🔧 Proposed fix
- validateBaseUrl(request.providerKey, request.baseUrl); + String normalizedBaseUrl = normalizeAndValidateBaseUrl( + request.providerKey, request.baseUrl); AIConnection newAiConnection = new AIConnection(); String apiKeyEncrypted = tokenEncryptionService.encrypt(request.apiKey); newAiConnection.setWorkspace(ws); newAiConnection.setName(request.name); newAiConnection.setProviderKey(request.providerKey); newAiConnection.setAiModel(request.aiModel); newAiConnection.setApiKeyEncrypted(apiKeyEncrypted); - newAiConnection.setBaseUrl( - request.providerKey == AIProviderKey.OPENAI_COMPATIBLE ? request.baseUrl : null - ); + newAiConnection.setBaseUrl(normalizedBaseUrl); @@ AIProviderKey effectiveProvider = request.providerKey != null ? request.providerKey : connection.getProviderKey(); if (effectiveProvider == AIProviderKey.OPENAI_COMPATIBLE) { String effectiveBaseUrl = request.baseUrl != null ? request.baseUrl : connection.getBaseUrl(); - validateBaseUrl(effectiveProvider, effectiveBaseUrl); - if (request.baseUrl != null) { - connection.setBaseUrl(request.baseUrl); - } + connection.setBaseUrl( + normalizeAndValidateBaseUrl(effectiveProvider, effectiveBaseUrl) + ); } else { connection.setBaseUrl(null); } @@ - private void validateBaseUrl(AIProviderKey providerKey, String baseUrl) { - if (providerKey == AIProviderKey.OPENAI_COMPATIBLE) { - if (baseUrl == null || baseUrl.isBlank()) { - throw new IllegalArgumentException( - "Base URL is required for OpenAI Compatible provider"); - } - // Strip trailing slash for consistency - String normalized = baseUrl.strip(); - if (normalized.endsWith("/")) { - normalized = normalized.substring(0, normalized.length() - 1); - } - SsrfSafeUrlValidator.validate(normalized); - } + private String normalizeAndValidateBaseUrl(AIProviderKey providerKey, String baseUrl) { + if (providerKey != AIProviderKey.OPENAI_COMPATIBLE) { + return null; + } + if (baseUrl == null || baseUrl.isBlank()) { + throw new IllegalArgumentException("Base URL is required for OpenAI Compatible provider"); + } + String normalized = baseUrl.strip().replaceAll("/+$", ""); + SsrfSafeUrlValidator.validate(normalized); + return normalized; }Also applies to: 87-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java` around lines 50 - 63, validateBaseUrl currently normalizes the URL but the code still persists request.baseUrl raw; change the call to capture and use the normalized value (e.g. String normalizedBaseUrl = validateBaseUrl(request.providerKey, request.baseUrl)) and then call newAiConnection.setBaseUrl(request.providerKey == AIProviderKey.OPENAI_COMPATIBLE ? normalizedBaseUrl : null); also replace other assignments in the same method/region (the other setBaseUrl usage referenced in the comment) to persist the normalizedBaseUrl instead of request.baseUrl so stored values match the validator's output.
🧹 Nitpick comments (16)
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java (1)
66-71: Extract a shared location-fingerprint builder.The same fingerprint string assembly appears in multiple places. A single helper would reduce drift risk and keep dedup semantics consistent.
Also applies to: 130-132, 167-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java` around lines 66 - 71, Extract a single helper method (e.g., buildLocationFingerprint or buildLocationFingerprintForIssue) in BranchIssueMappingService that accepts the inputs used to assemble the fingerprint (preferably the BranchIssue instance or its filePath, lineHash, and issueCategory) and returns the same string currently built at the three sites (the filePath + ":" + lineHash + ":" + (issueCategory != null ? issueCategory.name() : "UNKNOWN")); replace the inline assembly at the locations referencing branchLocationFingerprints (the blocks around the current fingerprint creation at lines referenced in the review) with calls to this new helper and reuse it everywhere to ensure consistent dedup semantics.python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py (1)
71-71: Rename the unused loop variable.Ruff will keep flagging
familyhere.♻️ Minimal cleanup
- for family, _type, _proto, _canonname, sockaddr in infos: + for _family, _type, _proto, _canonname, sockaddr in infos:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py` at line 71, The for-loop unpacks infos into variables including an unused variable named "family" which Ruff flags; rename it to a conventional unused name (e.g., "_family" or "_" ) in the loop header that iterates over "infos" (for _family, _type, _proto, _canonname, sockaddr in infos) so the linter stops reporting an unused variable while leaving the rest of the unpacking and logic (variables _type, _proto, _canonname, sockaddr) unchanged.java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java (1)
27-31: Clarify the small-delta rule in the shared API.
MIN_DELTA_DIFF_SIZEandshouldEscalateToFull()currently imply that sub-threshold deltas stay incremental, but the GitHub/GitLab/Bitbucket services in this change still force that case back toFULL. Align the shared helper/docs with the policy the callers actually implement before this API gets reused elsewhere.Also applies to: 98-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java` around lines 27 - 31, The constant MIN_DELTA_DIFF_SIZE and the helper method shouldEscalateToFull() claim sub-threshold deltas always remain incremental, but some callers (GitHub/GitLab/Bitbucket services) force FULL escalation; update the shared API to reflect that policy by changing the Javadoc/comments on MIN_DELTA_DIFF_SIZE and shouldEscalateToFull() in VcsDiffUtils to state that values below the threshold are treated as "trivially small" and qualify for incremental mode by default, but callers may override and escalate to FULL (and include a note naming the services that do this), and consider adding a TODO or parameter in shouldEscalateToFull() if you want caller-controlled escalation in future so consumers know the helper does not enforce caller policies.java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java (1)
73-86: Consider testingbaseUrlmapping infromAiConnectiontests.The
fromAiConnection()tests don't verify the newbaseUrlfield mapping. While the record constructor tests coverbaseUrl, the factory method conversion path remains untested.💡 Suggested enhancement
`@Test` `@DisplayName`("should convert AIConnection with all fields") void shouldConvertWithAllFields() { AIConnection connection = new AIConnection(); setField(connection, "id", 1L); connection.setName("Production AI"); setField(connection, "providerKey", AIProviderKey.ANTHROPIC); setField(connection, "aiModel", "claude-3-opus"); + setField(connection, "baseUrl", "https://custom.api.com"); AIConnectionDTO dto = AIConnectionDTO.fromAiConnection(connection); assertThat(dto.id()).isEqualTo(1L); assertThat(dto.name()).isEqualTo("Production AI"); assertThat(dto.providerKey()).isEqualTo(AIProviderKey.ANTHROPIC); assertThat(dto.aiModel()).isEqualTo("claude-3-opus"); + assertThat(dto.baseUrl()).isEqualTo("https://custom.api.com"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java` around lines 73 - 86, The AIConnectionDTO.fromAiConnection conversion tests are missing verification of the new baseUrl mapping; update the shouldConvertWithAllFields test to set the baseUrl on the source AIConnection (e.g., connection.setBaseUrl("https://api.example")) and assert that AIConnectionDTO.fromAiConnection(connection).baseUrl() equals the same value, ensuring AIConnection, AIConnectionDTO.fromAiConnection and AIConnectionDTO.baseUrl() are exercised.java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql (1)
14-15: Comment claims HTTPS requirement but constraint doesn't enforce it.The column comment states "Must be HTTPS" but the
ai_connection_base_url_checkconstraint only validates thatbase_urlis non-null and non-empty forOPENAI_COMPATIBLE. If HTTPS is a security requirement, consider enforcing it at the database level.🔐 Optional: Enforce HTTPS in constraint
ALTER TABLE ai_connection ADD CONSTRAINT ai_connection_base_url_check CHECK ( - (provider_key = 'OPENAI_COMPATIBLE' AND base_url IS NOT NULL AND base_url <> '') + (provider_key = 'OPENAI_COMPATIBLE' AND base_url IS NOT NULL AND base_url <> '' AND base_url LIKE 'https://%') OR (provider_key <> 'OPENAI_COMPATIBLE' AND base_url IS NULL) );Alternatively, if HTTPS validation is handled at the application layer, update the comment to remove the "Must be HTTPS" statement to avoid confusion.
Also applies to: 25-30
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql` around lines 14 - 15, The comment on ai_connection.base_url says "Must be HTTPS" but the ai_connection_base_url_check constraint doesn't enforce HTTPS; either update the constraint to require HTTPS for OPENAI_COMPATIBLE (modify ai_connection_base_url_check to include a condition like provider != 'OPENAI_COMPATIBLE' OR base_url ILIKE 'https://%') or remove the "Must be HTTPS" phrase from the COMMENT ON COLUMN ai_connection.base_url so the DB doc matches enforcement; locate ai_connection_base_url_check and the COMMENT ON COLUMN ai_connection.base_url in the migration and apply one of these two changes so the comment and constraint are consistent.java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java (1)
19-21: Consider adding@Transactionalto the@Modifyingquery.The
deleteByProjectIdmethod uses@Modifyingbut lacks@Transactional. While this may work if the caller provides a transaction, it's a best practice to explicitly annotate modifying repository methods to ensure they always run within a transaction context.🔧 Suggested fix
+import org.springframework.transaction.annotation.Transactional; + `@Repository` public interface QaDocStateRepository extends JpaRepository<QaDocState, Long> { Optional<QaDocState> findByProjectIdAndTaskId(Long projectId, String taskId); boolean existsByProjectIdAndTaskId(Long projectId, String taskId); + `@Transactional` `@Modifying` `@Query`("DELETE FROM QaDocState q WHERE q.project.id = :projectId") void deleteByProjectId(`@Param`("projectId") Long projectId); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java` around lines 19 - 21, The `@Modifying` repository method deleteByProjectId in QaDocStateRepository should be annotated with `@Transactional` to guarantee it executes inside a transaction; update the QaDocStateRepository interface by adding the javax.transaction (or Spring's org.springframework.transaction.annotation) `@Transactional` annotation on the deleteByProjectId method (or class) so the DELETE query always runs within a transaction context.java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java (1)
548-560: This test doesn't actually cover 403.The display name says
401/403, but Line 552 only injects a 401 failure. A regression in the 403 branch would still pass here, so either parameterize the test over both statuses or rename it to 401-only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java` around lines 548 - 560, The test should actually cover both 401 and 403 cases: change shouldReturnErrorWhenJiraAuthFails to be parameterized over the status codes (e.g., use `@ParameterizedTest` with `@ValueSource` ints {401,403}) or split into two tests (e.g., shouldReturnErrorWhenJira401AuthFails and shouldReturnErrorWhenJira403AuthFails); then in the body replace the hardcoded TaskManagementException( "Authentication failed", 401, "" ) with TaskManagementException("Authentication failed", status, "") using the parameter, keep the same assertions on processor.process(payload, project, eventConsumer, Map.of()), and verifyNoInteractions(qaDocGenerationService) so both 401 and 403 paths are validated.java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java (1)
33-34:changedFilePathsis still mutable despite the “immutable context” contract.The builder passes the caller’s
Listreference straight into the record, so later mutations can change the built context from the outside. A defensiveList.copyOf(...)here would make the immutability claim true for this field.Also applies to: 96-115
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java` around lines 33 - 34, QaDocGenerationContext currently accepts the caller's mutable List for the changedFilePaths field (and other list fields built around lines 96-115), violating the "immutable context" contract; update the record construction (or the QaDocGenerationContext.Builder) to defensively copy incoming Lists using List.copyOf(...) before assigning them to the record fields so the built QaDocGenerationContext (and fields like changedFilePaths and the other list members referenced in the builder) cannot be mutated by callers after construction.python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py (1)
22-31: Fullwidth colons are intentional for CJK locale support — consider adding a comment.The static analysis tool flags
:(U+FF1A fullwidth colon) as ambiguous. This appears intentional for matching Jira descriptions that may use fullwidth punctuation (common in CJK locales). Adding a brief comment would prevent future maintainers from "fixing" this.📝 Suggested clarifying comment
# ── Acceptance-criteria extraction patterns ───────────────────────── # Common header patterns found in Jira descriptions (case-insensitive): # - "Acceptance Criteria" # - "AC:" / "ACs:" # - "Definition of Done" +# Note: Patterns include both ASCII colon (:) and fullwidth colon (:) for CJK locale support. _AC_HEADER_PATTERNS = [🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py` around lines 22 - 31, Add a short inline comment above _AC_HEADER_PATTERNS (and/or next to the fullwidth colon characters in the patterns) explaining that the fullwidth colon ':' (U+FF1A) is intentionally included to support CJK/Japanese/Chinese Jira descriptions and should not be removed; mention this is to satisfy locale-specific punctuation in headers like "Acceptance Criteria". Reference _AC_HEADER_PATTERNS and _AC_HEADER_REGEX so reviewers see why the fullwidth character is present.java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java (1)
75-91: Consider using JPA lifecycle callbacks for timestamps.The timestamps are initialized both at field declaration and in the constructor. While this works, JPA lifecycle callbacks (
@PrePersist,@PreUpdate) provide a cleaner, centralized approach.♻️ Using JPA lifecycle callbacks
- `@Column`(name = "created_at", nullable = false, updatable = false) - private OffsetDateTime createdAt = OffsetDateTime.now(); + `@Column`(name = "created_at", nullable = false, updatable = false) + private OffsetDateTime createdAt; - `@Column`(name = "updated_at", nullable = false) - private OffsetDateTime updatedAt = OffsetDateTime.now(); + `@Column`(name = "updated_at", nullable = false) + private OffsetDateTime updatedAt; // ... public QaDocState(Project project, String taskId) { this.project = project; this.taskId = taskId; - this.createdAt = OffsetDateTime.now(); - this.updatedAt = OffsetDateTime.now(); } + `@PrePersist` + protected void onCreate() { + createdAt = OffsetDateTime.now(); + updatedAt = OffsetDateTime.now(); + } + + `@PreUpdate` + protected void onUpdate() { + updatedAt = OffsetDateTime.now(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java` around lines 75 - 91, The createdAt and updatedAt timestamps in QaDocState are set both at field declaration and again in the constructor; replace this ad-hoc initialization with JPA lifecycle callbacks by removing the default OffsetDateTime.now() assignments and the constructor timestamp sets, and add `@PrePersist` and `@PreUpdate` methods (e.g., prePersist() and preUpdate()) in the QaDocState class that set createdAt and updatedAt appropriately (createdAt and updatedAt on persist, updatedAt on update) so JPA manages timestamps centrally.python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py (1)
46-73: Consider a request object to reduce parameter count.The
generate()method has 20+ parameters. While this is a thin coordinator passing data to the orchestrator, aQaDocGenerationRequestdataclass would improve maintainability and make the API easier to evolve.This is not blocking — the current implementation works and matches the router's request model.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py` around lines 46 - 73, The generate method on QaDocService has an excessive parameter list; introduce a QaDocGenerationRequest dataclass (or pydantic model) to encapsulate all inputs (project_id, project_name, pr_number, issues_found, files_analyzed, pr_metadata, template_mode, custom_template, task_context, ai_provider, ai_model, ai_api_key, ai_base_url, diff, delta_diff, previous_documentation, is_same_pr_rerun, enrichment_data, changed_file_paths, workspace_slug, repo_slug, source_branch, target_branch, vcs_provider, output_language) and change the signature of QaDocService.generate(...) to accept a single request: QaDocGenerationRequest (plus any rare optional args like cancellation/context if needed); update all call sites (router and orchestrator invocation inside generate) to construct and pass the new dataclass, preserving field names and types so behavior remains unchanged.java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql (2)
16-16: VARCHAR(4096) for PR numbers is pragmatic but has limits.The comma-separated string can hold ~400-800 PR numbers. For most tasks this is sufficient, but very long-lived tasks with many PRs could hit this limit.
Consider documenting this limit or using
TEXTtype with the JPA logic handling truncation/rotation if it becomes a concern. This is not blocking for initial release.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql` at line 16, The column documented_pr_numbers is defined as VARCHAR(4096), which limits stored comma-separated PR IDs and may overflow for very long-lived tasks; update the migration to use TEXT (or add a schema comment documenting the ~400–800 PR limit) and adjust the JPA/entity handling (where the field mapped to documented_pr_numbers) to perform truncation/rotation or validation when persisting so large PR lists are handled safely; alternatively add a migration comment and unit tests that assert behavior when the PR list approaches/exceeds the VARCHAR limit.
28-29: Indexidx_qa_doc_state_taskis redundant with the unique constraint.PostgreSQL automatically creates a unique B-tree index to enforce
uq_qa_doc_state_project_task. The explicit index on the same columns(project_id, task_id)is redundant and consumes additional storage/write overhead.🗑️ Remove redundant index
CREATE INDEX IF NOT EXISTS idx_qa_doc_state_project ON qa_doc_state(project_id); -CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task - ON qa_doc_state(project_id, task_id);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql` around lines 28 - 29, The explicit CREATE INDEX for idx_qa_doc_state_task should be removed because the unique constraint uq_qa_doc_state_project_task on table qa_doc_state already creates a B-tree index on (project_id, task_id); delete the statement "CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task ON qa_doc_state(project_id, task_id);" from the migration, or if you intended a different index, replace it with the correct column set rather than duplicating uq_qa_doc_state_project_task.python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py (1)
69-72: OAuth credentials should use SecretStr for automatic redaction.Using Pydantic's
SecretStrtype foroauth_key,oauth_secret, andbearer_tokenwould prevent accidental logging of these sensitive values.🔒 Use SecretStr for credentials
from fastapi import APIRouter, Request -from pydantic import BaseModel, Field +from pydantic import BaseModel, Field, SecretStr from typing import Optional, Dict, Any, List, Literal # ... # OAuth credentials (for Python-side RAG/VCS access) - oauth_key: Optional[str] = None - oauth_secret: Optional[str] = None - bearer_token: Optional[str] = None + oauth_key: Optional[SecretStr] = None + oauth_secret: Optional[SecretStr] = None + bearer_token: Optional[SecretStr] = NoneThen when passing to the service, use
.get_secret_value()to unwrap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py` around lines 69 - 72, The three credential fields oauth_key, oauth_secret, and bearer_token are currently plain strings and should be typed as pydantic.SecretStr to enable automatic redaction; change their annotations to SecretStr (import SecretStr from pydantic), update all call sites that pass these values to services to safely unwrap with .get_secret_value() (and handle None if Optional), and adjust any logging to avoid printing the raw secret objects. Ensure imports and Optional typing remain correct and that services receive the unwrapped string via .get_secret_value() only when not None.java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java (1)
157-162: Sentinel detection could produce false positives.The check
documentation.contains("could not be generated")might reject legitimate documentation that discusses generation failures or error handling.Consider a more specific sentinel string that's unlikely to appear in normal documentation, e.g.,
[CODECROW_GENERATION_FAILED].🔧 Use a unique sentinel marker
Define a constant that both Python and Java share:
private static final String GENERATION_FAILED_SENTINEL = "[CODECROW_GENERATION_FAILED]";Then in the guard:
- if (documentation == null || documentation.isBlank() - || documentation.contains("could not be generated")) { + if (documentation == null || documentation.isBlank() + || documentation.contains(GENERATION_FAILED_SENTINEL)) {Ensure the Python orchestrator uses the same sentinel when generation fails.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java` around lines 157 - 162, Replace the brittle substring check with a dedicated sentinel: add a private static final String GENERATION_FAILED_SENTINEL = "[CODECROW_GENERATION_FAILED]" to QaDocGenerationService, and change the guard to reject documentation when documentation == null || documentation.isBlank() || documentation.trim().equals(GENERATION_FAILED_SENTINEL) (use trim+equals to match exact sentinel); also ensure the Python orchestrator emits the exact same "[CODECROW_GENERATION_FAILED]" sentinel when generation fails so both sides agree.python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py (1)
79-95: The advertised diff fallback is not implemented.
index_pr_files()acceptsdiffand documents a diff-hunk fallback, butfilesis populated only fromenrichment_lookup. On partial enrichment runs this silently indexes nothing instead of using the fallback path the method promises.Either build a real diff-based fallback here, or remove the
diffparameter/docstring claim so callers do not assume those files were indexed.Also applies to: 99-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py` around lines 79 - 95, The method index_pr_files currently only builds files from enrichment_lookup and ignores the diff parameter, so implement the promised diff-hunk fallback: when enrichment_data or enrichment_lookup lacks entries for changed_file_paths, parse the provided diff into per-file hunks and populate the local files collection (used later in index_pr_files) with lightweight file objects derived from those hunks (e.g., path, hunk text, and language/metadata as needed) so those files are indexed; ensure you use the existing changed_file_paths and diff parameters to drive this fallback and keep existing behavior when enrichment_data is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.java`:
- Line 23: Tests in AiCommandClientRecordsTest updated constructor args to
include the new aiBaseUrl field but never assert it; add explicit assertions
calling request.aiBaseUrl() in each affected test (the tests that construct the
record and name the variable request) to verify the expected value (e.g., null
or "..." as appropriate for each case), alongside existing assertions so
regressions for the new aiBaseUrl property are caught; locate the record
instantiations in methods within AiCommandClientRecordsTest and add a matching
assertEquals(expected, request.aiBaseUrl()) for each scenario.
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`:
- Around line 98-107: getDocumentedPrNumbersSet currently calls Long.parseLong
on every token and will throw NumberFormatException for any
corrupted/non-numeric tokens in documentedPrNumbers; update
getDocumentedPrNumbersSet to defensively parse tokens by trimming and attempting
to parse each token inside a try/catch (or by pre-validating with a numeric
regex), skip tokens that fail to parse instead of letting the exception
propagate, collect the successfully parsed Longs into an unmodifiable Set, and
if a logger (e.g., LOGGER) is available emit a warning with the invalid token(s)
to aid debugging; ensure the method still returns Collections.emptySet() when
documentedPrNumbers is blank or no valid numbers are found.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`:
- Around line 322-336: The delta diff fetch in QaDocCommandProcessor (using
VcsDiffUtils.fetchDeltaDiff which delegates to ops.getCommitRangeDiff) must be
guarded so failures don't abort the whole command; wrap the fetch call (the
block that computes deltaDiff using opsService/httpClient and
VcsDiffUtils.fetchDeltaDiff) in a try/catch that on any exception sets deltaDiff
= null, logs a warning including the exception and context (repoSlug, commit
hashes), and allows processing to continue so the generator falls back to the
full diff; ensure no exception from getCommitRangeDiff or fetchDeltaDiff bubbles
out.
- Around line 293-298: The QaDocState currently stores only lastCommitHash per
(projectId, taskId) which breaks multi-PR runs; change QaDocState to replace
lastCommitHash with a Map<Long,String> prCommitHashes keyed by PR number, update
recordGeneration(String commitHash, Long analysisId, Long prNumber) to store
prCommitHashes.put(prNumber, commitHash), and when loading state via
qaDocStateRepository.findByProjectIdAndTaskId(...) use
prCommitHashes.get(prNumber) to determine the delta base and to evaluate
isSamePrRerun for the current PR instead of the single lastCommitHash; also
adjust any helper methods (e.g., isPrDocumented) to consult the new map.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.java`:
- Around line 355-374: The state is being upserted before the Jira comment is
guaranteed posted, which can mark a PR as documented even if posting fails; move
the call to upsertQaDocState so it executes only after tmClient.postComment or
tmClient.updateComment succeeds (i.e., after the try block that posts/updates
the comment), or implement a two-phase commit where you mark state only on
successful comment response and on failure do not set the documented flag so
isSamePrRerun sees it as not completed; update references around
upsertQaDocState, tmClient.postComment, tmClient.updateComment and isSamePrRerun
accordingly.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java`:
- Around line 15-60: QaDocGenerationContext is a Java record that currently
exposes credentials via the auto-generated toString(); override the record's
toString() to redact sensitive fields (at least oauthSecret and bearerToken, and
consider oauthKey) so logs/traces don't contain plaintext secrets—implement a
custom toString() in QaDocGenerationContext that returns the same contextual
info but replaces those fields with masked values like "<redacted>" or partially
masked strings.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java`:
- Around line 243-252: The payload includes oauth_key, oauth_secret, and
bearer_token but QaDocGenerationService does not forward them to the service
layer: either extend qa_service.generate(...) (or the intermediate method that
calls it) to accept and propagate these credentials from QaDocumentationRequest
(fields oauthKey/oauthSecret/bearerToken) down to the Python RAG/VCS access
code, ensuring secure handling and unit tests; or remove those keys from the
Java payload and the QaDocumentationRequest model if they are unused. Also
confirm ssrf_safe_transport.py’s HTTPS enforcement (the ALLOW_PRIVATE_ENDPOINTS
toggle around line 52) is compatible with your deployment and document or adjust
configuration as needed.
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/controller/LlmModelController.java`:
- Around line 122-134: The controller currently instantiates a raw RestTemplate
inside LlmModelController (new RestTemplate()) and uses it to call modelsUrl
with restTemplate.exchange, which bypasses the configured RestTemplate bean with
timeouts; change LlmModelController to accept a RestTemplate (or
RestTemplateBuilder) via constructor injection (or `@Autowired`) and use that
injected instance when calling restTemplate.exchange(modelsUrl, HttpMethod.GET,
entity, Map.class) so the call honors the timeouts from RestTemplateConfig (or
build a timed RestTemplate from the injected RestTemplateBuilder before using
it).
In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py`:
- Around line 36-80: validate_endpoint_url currently only runs at client
creation and does not prevent DNS rebinding at request time; fix by pinning
resolved IPs into the httpx transport used by ChatOpenAI so that the client will
only connect to the validated addresses: in the code paths that create
httpx.Client / httpx.AsyncClient, resolve the hostname using the same logic as
validate_endpoint_url (reusing validate_endpoint_url and _is_safe_ip), store the
chosen sockaddr IP(s) and construct a custom httpx transport or provide a custom
resolver that always returns those IP(s), and ensure any connect hooks
(connect_failed/request) reject attempts to connect to any IP not equal to the
pinned addresses; also apply the same change to the async client creation path
so both sync/async clients are protected against DNS rebinding.
In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`:
- Around line 202-213: The Stage 3 aggregation choice must use the same
delta-quality gate as earlier: replace the current logic that picks delta_diff
whenever delta_diff is non-empty with the same has_real_delta check used for
analysis_diff (use has_real_delta && '@@' check outcome). Concretely, where
Stage 3 currently selects the prompt/input (references to delta_diff, diff, and
the Stage 3 prompt/aggregation block), change it to use analysis_diff or to pick
delta_diff only when has_real_delta is true (otherwise fall back to diff) so
Stage 3 won't run a delta-only aggregation for header-only deltas; keep the
existing has_real_delta variable and reuse it.
- Around line 633-648: In _execute_stage_3_delta remove the unreachable dead
code at the end that calls self.llm.ainvoke and returns
self._extract_text(response); both the try and except paths already return and
the local prompt variable referenced there is undefined in that scope (causing
F821). Delete the response = await self.llm.ainvoke([...]) and the subsequent
return self._extract_text(response) lines so the function ends after the
existing try/except/_attempt logic and no undefined symbols remain.
In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`:
- Around line 482-504: The current semantic dedupe in the block processing
tier1_result (using by_file, kept, title, existing_title and
difflib.SequenceMatcher) removes issues with similar titles even when they point
to different locations; update the dedupe logic so that before marking is_dup
true you verify location evidence: require matching or overlapping location
fields (e.g., compare issue.get("startLine")/issue.get("endLine") or
issue.get("line")/issue.get("span")/issue.get("filePath") if present) and only
treat as duplicates when both title similarity >= 0.75 AND the location is
identical or overlaps (or when location is missing for both); otherwise keep
both issues so distinct findings in the same file are not dropped.
- Around line 455-467: The code uses the typing symbol Set (seen_locations:
Set[str] = set()) but Set is not imported; update the module's typing import
list to include Set (alongside Dict, Any, List, Optional, Callable) so
references like seen_locations and any other Set-typed annotations resolve
correctly; ensure you only modify the import line(s) at the top of
orchestrator.py to add Set.
In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`:
- Around line 134-145: The current AC-removal uses a plain substring search
which misses headers matched by _AC_HEADER_REGEX; update the logic in the block
that computes desc_to_show (the code that checks description, ac_text and
currently uses description.lower().find(...)) to run _AC_HEADER_REGEX on
description (preserving case/flags used by the extractor), and if a match is
found use match.start() to slice out the AC section (or match.end() to drop a
leading header), otherwise fall back to the existing substring checks; ensure
you reference description, ac_text, desc_to_show and _AC_HEADER_REGEX so the
removal reuses the same regex positions the extractor uses and prevents
duplicate AC content.
---
Outside diff comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`:
- Around line 149-158: When catching DataIntegrityViolationException in
BranchIssueMappingService (the saveAndFlush duplicate-save fallback), also
update the in-memory location cache used to track saved issue locations in
addition to adding to branchContentFingerprints; locate the catch block handling
DataIntegrityViolationException around saveAndFlush and ensure the same cache
update (the set/map that prevents reprocessing of a file/line location — e.g.,
the branchLocationCache/processedLocations structure used elsewhere in the
method) is updated there; apply the identical fix to the second duplicate-save
catch path (the other DataIntegrityViolationException block around lines
166-170).
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java`:
- Around line 50-63: validateBaseUrl currently normalizes the URL but the code
still persists request.baseUrl raw; change the call to capture and use the
normalized value (e.g. String normalizedBaseUrl =
validateBaseUrl(request.providerKey, request.baseUrl)) and then call
newAiConnection.setBaseUrl(request.providerKey ==
AIProviderKey.OPENAI_COMPATIBLE ? normalizedBaseUrl : null); also replace other
assignments in the same method/region (the other setBaseUrl usage referenced in
the comment) to persist the normalizedBaseUrl instead of request.baseUrl so
stored values match the validator's output.
---
Nitpick comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`:
- Around line 66-71: Extract a single helper method (e.g.,
buildLocationFingerprint or buildLocationFingerprintForIssue) in
BranchIssueMappingService that accepts the inputs used to assemble the
fingerprint (preferably the BranchIssue instance or its filePath, lineHash, and
issueCategory) and returns the same string currently built at the three sites
(the filePath + ":" + lineHash + ":" + (issueCategory != null ?
issueCategory.name() : "UNKNOWN")); replace the inline assembly at the locations
referencing branchLocationFingerprints (the blocks around the current
fingerprint creation at lines referenced in the review) with calls to this new
helper and reuse it everywhere to ensure consistent dedup semantics.
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java`:
- Around line 27-31: The constant MIN_DELTA_DIFF_SIZE and the helper method
shouldEscalateToFull() claim sub-threshold deltas always remain incremental, but
some callers (GitHub/GitLab/Bitbucket services) force FULL escalation; update
the shared API to reflect that policy by changing the Javadoc/comments on
MIN_DELTA_DIFF_SIZE and shouldEscalateToFull() in VcsDiffUtils to state that
values below the threshold are treated as "trivially small" and qualify for
incremental mode by default, but callers may override and escalate to FULL (and
include a note naming the services that do this), and consider adding a TODO or
parameter in shouldEscalateToFull() if you want caller-controlled escalation in
future so consumers know the helper does not enforce caller policies.
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`:
- Around line 75-91: The createdAt and updatedAt timestamps in QaDocState are
set both at field declaration and again in the constructor; replace this ad-hoc
initialization with JPA lifecycle callbacks by removing the default
OffsetDateTime.now() assignments and the constructor timestamp sets, and add
`@PrePersist` and `@PreUpdate` methods (e.g., prePersist() and preUpdate()) in the
QaDocState class that set createdAt and updatedAt appropriately (createdAt and
updatedAt on persist, updatedAt on update) so JPA manages timestamps centrally.
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java`:
- Around line 19-21: The `@Modifying` repository method deleteByProjectId in
QaDocStateRepository should be annotated with `@Transactional` to guarantee it
executes inside a transaction; update the QaDocStateRepository interface by
adding the javax.transaction (or Spring's
org.springframework.transaction.annotation) `@Transactional` annotation on the
deleteByProjectId method (or class) so the DELETE query always runs within a
transaction context.
In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql`:
- Around line 14-15: The comment on ai_connection.base_url says "Must be HTTPS"
but the ai_connection_base_url_check constraint doesn't enforce HTTPS; either
update the constraint to require HTTPS for OPENAI_COMPATIBLE (modify
ai_connection_base_url_check to include a condition like provider !=
'OPENAI_COMPATIBLE' OR base_url ILIKE 'https://%') or remove the "Must be HTTPS"
phrase from the COMMENT ON COLUMN ai_connection.base_url so the DB doc matches
enforcement; locate ai_connection_base_url_check and the COMMENT ON COLUMN
ai_connection.base_url in the migration and apply one of these two changes so
the comment and constraint are consistent.
In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql`:
- Line 16: The column documented_pr_numbers is defined as VARCHAR(4096), which
limits stored comma-separated PR IDs and may overflow for very long-lived tasks;
update the migration to use TEXT (or add a schema comment documenting the
~400–800 PR limit) and adjust the JPA/entity handling (where the field mapped to
documented_pr_numbers) to perform truncation/rotation or validation when
persisting so large PR lists are handled safely; alternatively add a migration
comment and unit tests that assert behavior when the PR list approaches/exceeds
the VARCHAR limit.
- Around line 28-29: The explicit CREATE INDEX for idx_qa_doc_state_task should
be removed because the unique constraint uq_qa_doc_state_project_task on table
qa_doc_state already creates a B-tree index on (project_id, task_id); delete the
statement "CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task ON
qa_doc_state(project_id, task_id);" from the migration, or if you intended a
different index, replace it with the correct column set rather than duplicating
uq_qa_doc_state_project_task.
In
`@java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java`:
- Around line 73-86: The AIConnectionDTO.fromAiConnection conversion tests are
missing verification of the new baseUrl mapping; update the
shouldConvertWithAllFields test to set the baseUrl on the source AIConnection
(e.g., connection.setBaseUrl("https://api.example")) and assert that
AIConnectionDTO.fromAiConnection(connection).baseUrl() equals the same value,
ensuring AIConnection, AIConnectionDTO.fromAiConnection and
AIConnectionDTO.baseUrl() are exercised.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java`:
- Around line 33-34: QaDocGenerationContext currently accepts the caller's
mutable List for the changedFilePaths field (and other list fields built around
lines 96-115), violating the "immutable context" contract; update the record
construction (or the QaDocGenerationContext.Builder) to defensively copy
incoming Lists using List.copyOf(...) before assigning them to the record fields
so the built QaDocGenerationContext (and fields like changedFilePaths and the
other list members referenced in the builder) cannot be mutated by callers after
construction.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java`:
- Around line 157-162: Replace the brittle substring check with a dedicated
sentinel: add a private static final String GENERATION_FAILED_SENTINEL =
"[CODECROW_GENERATION_FAILED]" to QaDocGenerationService, and change the guard
to reject documentation when documentation == null || documentation.isBlank() ||
documentation.trim().equals(GENERATION_FAILED_SENTINEL) (use trim+equals to
match exact sentinel); also ensure the Python orchestrator emits the exact same
"[CODECROW_GENERATION_FAILED]" sentinel when generation fails so both sides
agree.
In
`@java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java`:
- Around line 548-560: The test should actually cover both 401 and 403 cases:
change shouldReturnErrorWhenJiraAuthFails to be parameterized over the status
codes (e.g., use `@ParameterizedTest` with `@ValueSource` ints {401,403}) or split
into two tests (e.g., shouldReturnErrorWhenJira401AuthFails and
shouldReturnErrorWhenJira403AuthFails); then in the body replace the hardcoded
TaskManagementException( "Authentication failed", 401, "" ) with
TaskManagementException("Authentication failed", status, "") using the
parameter, keep the same assertions on processor.process(payload, project,
eventConsumer, Map.of()), and verifyNoInteractions(qaDocGenerationService) so
both 401 and 403 paths are validated.
In `@python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py`:
- Around line 69-72: The three credential fields oauth_key, oauth_secret, and
bearer_token are currently plain strings and should be typed as
pydantic.SecretStr to enable automatic redaction; change their annotations to
SecretStr (import SecretStr from pydantic), update all call sites that pass
these values to services to safely unwrap with .get_secret_value() (and handle
None if Optional), and adjust any logging to avoid printing the raw secret
objects. Ensure imports and Optional typing remain correct and that services
receive the unwrapped string via .get_secret_value() only when not None.
In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py`:
- Line 71: The for-loop unpacks infos into variables including an unused
variable named "family" which Ruff flags; rename it to a conventional unused
name (e.g., "_family" or "_" ) in the loop header that iterates over "infos"
(for _family, _type, _proto, _canonname, sockaddr in infos) so the linter stops
reporting an unused variable while leaving the rest of the unpacking and logic
(variables _type, _proto, _canonname, sockaddr) unchanged.
In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py`:
- Around line 79-95: The method index_pr_files currently only builds files from
enrichment_lookup and ignores the diff parameter, so implement the promised
diff-hunk fallback: when enrichment_data or enrichment_lookup lacks entries for
changed_file_paths, parse the provided diff into per-file hunks and populate the
local files collection (used later in index_pr_files) with lightweight file
objects derived from those hunks (e.g., path, hunk text, and language/metadata
as needed) so those files are indexed; ensure you use the existing
changed_file_paths and diff parameters to drive this fallback and keep existing
behavior when enrichment_data is present.
In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py`:
- Around line 46-73: The generate method on QaDocService has an excessive
parameter list; introduce a QaDocGenerationRequest dataclass (or pydantic model)
to encapsulate all inputs (project_id, project_name, pr_number, issues_found,
files_analyzed, pr_metadata, template_mode, custom_template, task_context,
ai_provider, ai_model, ai_api_key, ai_base_url, diff, delta_diff,
previous_documentation, is_same_pr_rerun, enrichment_data, changed_file_paths,
workspace_slug, repo_slug, source_branch, target_branch, vcs_provider,
output_language) and change the signature of QaDocService.generate(...) to
accept a single request: QaDocGenerationRequest (plus any rare optional args
like cancellation/context if needed); update all call sites (router and
orchestrator invocation inside generate) to construct and pass the new
dataclass, preserving field names and types so behavior remains unchanged.
In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`:
- Around line 22-31: Add a short inline comment above _AC_HEADER_PATTERNS
(and/or next to the fullwidth colon characters in the patterns) explaining that
the fullwidth colon ':' (U+FF1A) is intentionally included to support
CJK/Japanese/Chinese Jira descriptions and should not be removed; mention this
is to satisfy locale-specific punctuation in headers like "Acceptance Criteria".
Reference _AC_HEADER_PATTERNS and _AC_HEADER_REGEX so reviewers see why the
fullwidth character is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5e4fd477-5310-49ef-b358-1b00a17f610a
📒 Files selected for processing (56)
README.mdfrontendjava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/aiclient/AiAnalysisClient.javajava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClient.javajava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequest.javajava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequestImpl.javajava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.javajava-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.javajava-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.javajava-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientTest.javajava-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingServiceTest.javajava-ecosystem/libs/ast-parser/src/main/resources/META-INF/MANIFEST.MFjava-ecosystem/libs/core/src/main/java/module-info.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTO.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/dto/taskmanagement/QaAutoDocConfigRequest.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIConnection.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIProviderKey.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/QaAutoDocConfig.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisIssueRepository.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.javajava-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/util/SsrfSafeUrlValidator.javajava-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sqljava-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sqljava-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.javajava-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/model/ai/AIProviderKeyTest.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/service/BitbucketAiClientService.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/AskCommandProcessor.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/SummarizeCommandProcessor.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubAiClientService.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/service/GitLabAiClientService.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.javajava-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.javajava-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.javajava-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationServiceTest.javajava-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/controller/LlmModelController.javajava-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/dto/request/CreateAIConnectionRequest.javajava-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/dto/request/UpdateAiConnectionRequest.javajava-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.javajava-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/taskmanagement/service/TaskManagementService.javapython-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.pypython-ecosystem/inference-orchestrator/src/llm/llm_factory.pypython-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.pypython-ecosystem/inference-orchestrator/src/model/dtos.pypython-ecosystem/inference-orchestrator/src/service/command/command_service.pypython-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.pypython-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.pypython-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.pypython-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.pypython-ecosystem/inference-orchestrator/src/service/review/review_service.pypython-ecosystem/inference-orchestrator/src/utils/dependency_graph.pypython-ecosystem/inference-orchestrator/src/utils/prompts/constants_qa_doc.pypython-ecosystem/inference-orchestrator/src/utils/task_context_builder.py
| AiCommandClient.SummarizeRequest request = new AiCommandClient.SummarizeRequest( | ||
| 1L, "workspace", "repo-slug", "project-workspace", "namespace", | ||
| "openai", "gpt-4", "api-key", 42L, "feature", "main", "abc123", | ||
| "openai", "gpt-4", "api-key", null, 42L, "feature", "main", "abc123", |
There was a problem hiding this comment.
Add explicit assertions for the new aiBaseUrl record field.
These tests updated constructor arguments but never verify request.aiBaseUrl(), so regressions on this new component can slip through.
Suggested assertion additions
@@
assertThat(request.aiProvider()).isEqualTo("openai");
assertThat(request.aiModel()).isEqualTo("gpt-4");
assertThat(request.aiApiKey()).isEqualTo("api-key");
+ assertThat(request.aiBaseUrl()).isNull();
assertThat(request.pullRequestId()).isEqualTo(42L);
@@
assertThat(request.projectId()).isEqualTo(1L);
assertThat(request.aiProvider()).isEqualTo("anthropic");
assertThat(request.aiModel()).isEqualTo("claude-3");
+ assertThat(request.aiBaseUrl()).isNull();
assertThat(request.question()).isEqualTo("What is this code doing?");
@@
assertThat(request.projectId()).isEqualTo(1L);
+ assertThat(request.aiBaseUrl()).isNull();
assertThat(request.pullRequestId()).isEqualTo(42L);Also applies to: 57-57, 77-77, 149-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.java`
at line 23, Tests in AiCommandClientRecordsTest updated constructor args to
include the new aiBaseUrl field but never assert it; add explicit assertions
calling request.aiBaseUrl() in each affected test (the tests that construct the
record and name the variable request) to verify the expected value (e.g., null
or "..." as appropriate for each case), alongside existing assertions so
regressions for the new aiBaseUrl property are caught; locate the record
instantiations in methods within AiCommandClientRecordsTest and add a matching
assertEquals(expected, request.aiBaseUrl()) for each scenario.
| public Set<Long> getDocumentedPrNumbersSet() { | ||
| if (documentedPrNumbers == null || documentedPrNumbers.isBlank()) { | ||
| return Collections.emptySet(); | ||
| } | ||
| return Arrays.stream(documentedPrNumbers.split(",")) | ||
| .map(String::trim) | ||
| .filter(s -> !s.isEmpty()) | ||
| .map(Long::parseLong) | ||
| .collect(Collectors.toUnmodifiableSet()); | ||
| } |
There was a problem hiding this comment.
Uncaught NumberFormatException if PR numbers data is corrupted.
If documentedPrNumbers contains non-numeric values (e.g., from manual DB edits or bugs), Long.parseLong() will throw NumberFormatException, potentially breaking the QA doc flow.
🛡️ Proposed defensive parsing
public Set<Long> getDocumentedPrNumbersSet() {
if (documentedPrNumbers == null || documentedPrNumbers.isBlank()) {
return Collections.emptySet();
}
return Arrays.stream(documentedPrNumbers.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
- .map(Long::parseLong)
+ .flatMap(s -> {
+ try {
+ return java.util.stream.Stream.of(Long.parseLong(s));
+ } catch (NumberFormatException e) {
+ // Skip corrupted entries rather than failing
+ return java.util.stream.Stream.empty();
+ }
+ })
.collect(Collectors.toUnmodifiableSet());
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`
around lines 98 - 107, getDocumentedPrNumbersSet currently calls Long.parseLong
on every token and will throw NumberFormatException for any
corrupted/non-numeric tokens in documentedPrNumbers; update
getDocumentedPrNumbersSet to defensively parse tokens by trimming and attempting
to parse each token inside a try/catch (or by pre-validating with a numeric
regex), skip tokens that fail to parse instead of letting the exception
propagate, collect the successfully parsed Longs into an unmodifiable Set, and
if a logger (e.g., LOGGER) is available emit a warning with the invalid token(s)
to aid debugging; ensure the method still returns Collections.emptySet() when
documentedPrNumbers is blank or no valid numbers are found.
| // 6. Load server-side state and check for existing Jira comment | ||
| QaDocState state = (prNumber != null) | ||
| ? qaDocStateRepository.findByProjectIdAndTaskId(project.getId(), taskId).orElse(null) | ||
| : null; | ||
| boolean isSamePrRerun = (state != null && prNumber != null && state.isPrDocumented(prNumber)); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
fd -i 'QaDocState' --extension java
echo
rg -n -C3 'class QaDocState|lastCommitHash|documentedPrNumbers|recordGeneration|getLastCommitHash|findByProjectIdAndTaskId' --type javaRepository: rostilos/CodeCrow
Length of output: 22173
Store commit bases per PR within QaDocState, not just lastCommitHash.
The state is currently loaded by (projectId, taskId) and only holds one lastCommitHash for the entire task. When multiple PRs document the same task and one reruns, it will use another PR's last commit hash as the delta base, producing incorrect diffs. Replace the single lastCommitHash field with a mapping that tracks commit bases per PR number (e.g., Map<Long, String> prCommitHashes), and update recordGeneration(String commitHash, Long analysisId, Long prNumber) to store the hash keyed by PR, then retrieve it by current PR on rerun.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`
around lines 293 - 298, The QaDocState currently stores only lastCommitHash per
(projectId, taskId) which breaks multi-PR runs; change QaDocState to replace
lastCommitHash with a Map<Long,String> prCommitHashes keyed by PR number, update
recordGeneration(String commitHash, Long analysisId, Long prNumber) to store
prCommitHashes.put(prNumber, commitHash), and when loading state via
qaDocStateRepository.findByProjectIdAndTaskId(...) use
prCommitHashes.get(prNumber) to determine the delta base and to evaluate
isSamePrRerun for the current PR instead of the single lastCommitHash; also
adjust any helper methods (e.g., isPrDocumented) to consult the new map.
| // 6a. Compute delta diff for same-PR re-runs | ||
| String deltaDiff = null; | ||
| if (isSamePrRerun && state.getLastCommitHash() != null | ||
| && commitHash != null && opsService != null && httpClient != null) { | ||
| DiffContentFilter contentFilter = new DiffContentFilter(); | ||
| final OkHttpClient cl = httpClient; | ||
| final VcsOperationsService ops = opsService; | ||
| deltaDiff = VcsDiffUtils.fetchDeltaDiff( | ||
| (ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head), | ||
| workspace, repoSlug, | ||
| state.getLastCommitHash(), commitHash, contentFilter); | ||
| if (deltaDiff != null) { | ||
| log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run", | ||
| deltaDiff.length()); | ||
| } |
There was a problem hiding this comment.
Fall back cleanly when the delta diff cannot be computed.
This is the only VCS fetch in the rerun path that is not locally guarded. If getCommitRangeDiff(...) fails, the outer catch aborts the whole command even though the generator can still continue with deltaDiff = null and the full diff.
Suggested fallback
if (isSamePrRerun && state.getLastCommitHash() != null
&& commitHash != null && opsService != null && httpClient != null) {
- DiffContentFilter contentFilter = new DiffContentFilter();
- final OkHttpClient cl = httpClient;
- final VcsOperationsService ops = opsService;
- deltaDiff = VcsDiffUtils.fetchDeltaDiff(
- (ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head),
- workspace, repoSlug,
- state.getLastCommitHash(), commitHash, contentFilter);
- if (deltaDiff != null) {
- log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run",
- deltaDiff.length());
- }
+ try {
+ DiffContentFilter contentFilter = new DiffContentFilter();
+ final OkHttpClient cl = httpClient;
+ final VcsOperationsService ops = opsService;
+ deltaDiff = VcsDiffUtils.fetchDeltaDiff(
+ (ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head),
+ workspace, repoSlug,
+ state.getLastCommitHash(), commitHash, contentFilter);
+ if (deltaDiff != null) {
+ log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run",
+ deltaDiff.length());
+ }
+ } catch (Exception e) {
+ log.warn("qa-doc command: failed to compute delta diff, falling back to full diff: {}",
+ e.getMessage());
+ deltaDiff = null;
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`
around lines 322 - 336, The delta diff fetch in QaDocCommandProcessor (using
VcsDiffUtils.fetchDeltaDiff which delegates to ops.getCommitRangeDiff) must be
guarded so failures don't abort the whole command; wrap the fetch call (the
block that computes deltaDiff using opsService/httpClient and
VcsDiffUtils.fetchDeltaDiff) in a try/catch that on any exception sets deltaDiff
= null, logs a warning including the exception and context (repoSlug, commit
hashes), and allows processing to continue so the generator falls back to the
full diff; ensure no exception from getCommitRangeDiff or fetchDeltaDiff bubbles
out.
| // 9. Update server-side state (secure, tamper-proof) | ||
| Long analysisId = (analysis != null) ? analysis.getId() : null; | ||
| upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state); | ||
|
|
||
| // 10. Post or update Jira comment | ||
| String commentBody = COMMENT_MARKER + "\n\n" + qaDocument; | ||
|
|
||
| // 8. Post or update comment | ||
| if (existingComment.isPresent()) { | ||
| client.updateComment(taskId, existingComment.get().commentId(), commentBody); | ||
| log.info("QA auto-doc: updated existing comment on task {} (commentId={})", | ||
| taskId, existingComment.get().commentId()); | ||
| } else { | ||
| client.postComment(taskId, commentBody); | ||
| log.info("QA auto-doc: posted new comment on task {}", taskId); | ||
| try { | ||
| if (existingComment.isPresent()) { | ||
| tmClient.updateComment(taskId, existingComment.get().commentId(), commentBody); | ||
| log.info("QA auto-doc: updated existing comment on task {} (commentId={})", | ||
| taskId, existingComment.get().commentId()); | ||
| } else { | ||
| tmClient.postComment(taskId, commentBody); | ||
| log.info("QA auto-doc: posted new comment on task {}", taskId); | ||
| } | ||
| } catch (Exception e) { | ||
| log.error("QA auto-doc: failed to post/update comment on task {}: {}", | ||
| taskId, e.getMessage(), e); | ||
| } |
There was a problem hiding this comment.
State is persisted before Jira comment is posted — may cause skipped PRs on retry.
If upsertQaDocState (line 357) succeeds but the subsequent Jira comment posting (lines 362-370) fails, the state records the PR as documented even though no comment was posted. On the next analysis event, isSamePrRerun will be true, potentially causing the system to skip generating a full document.
Consider persisting state only after successful Jira posting, or implementing a two-phase approach.
🔧 Move state persistence after successful posting
- // 9. Update server-side state (secure, tamper-proof)
- Long analysisId = (analysis != null) ? analysis.getId() : null;
- upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state);
-
// 10. Post or update Jira comment
String commentBody = COMMENT_MARKER + "\n\n" + qaDocument;
try {
if (existingComment.isPresent()) {
tmClient.updateComment(taskId, existingComment.get().commentId(), commentBody);
log.info("QA auto-doc: updated existing comment on task {} (commentId={})",
taskId, existingComment.get().commentId());
} else {
tmClient.postComment(taskId, commentBody);
log.info("QA auto-doc: posted new comment on task {}", taskId);
}
+
+ // Persist state only after successful Jira posting
+ Long analysisId = (analysis != null) ? analysis.getId() : null;
+ upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state);
} catch (Exception e) {
log.error("QA auto-doc: failed to post/update comment on task {}: {}",
taskId, e.getMessage(), e);
+ // State not persisted — next run will retry with full generation
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.java`
around lines 355 - 374, The state is being upserted before the Jira comment is
guaranteed posted, which can mark a PR as documented even if posting fails; move
the call to upsertQaDocState so it executes only after tmClient.postComment or
tmClient.updateComment succeeds (i.e., after the try block that posts/updates
the comment), or implement a two-phase commit where you mark state only on
successful comment response and on failure do not set the documented flag so
isSamePrRerun sees it as not completed; update references around
upsertQaDocState, tmClient.postComment, tmClient.updateComment and isSamePrRerun
accordingly.
| # For same-PR re-runs, analyze the delta diff — but only if | ||
| # it contains actual hunks (@@). A delta that is truthy but | ||
| # header-only (no @@) would starve the whole pipeline of context. | ||
| has_real_delta = ( | ||
| is_same_pr_rerun | ||
| and delta_diff | ||
| and len(delta_diff.strip()) > 100 | ||
| and '@@' in delta_diff | ||
| ) | ||
| analysis_diff = delta_diff if has_real_delta else diff | ||
| if is_same_pr_rerun and not has_real_delta: | ||
| logger.info( |
There was a problem hiding this comment.
Use the same delta-quality gate when choosing Stage 3.
When delta_diff is header-only, Stage 1/2 already fall back to the full diff via has_real_delta, but Stage 3 still switches to the delta prompt on any non-empty delta_diff. That mixes full-diff analysis with a delta-only aggregation step and can mark unchanged scenarios as “updated in latest push.”
Suggested fix
- if is_same_pr_rerun and delta_diff and previous_documentation:
+ if has_real_delta and previous_documentation:
documentation = await self._execute_stage_3_delta(Also applies to: 258-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`
around lines 202 - 213, The Stage 3 aggregation choice must use the same
delta-quality gate as earlier: replace the current logic that picks delta_diff
whenever delta_diff is non-empty with the same has_real_delta check used for
analysis_diff (use has_real_delta && '@@' check outcome). Concretely, where
Stage 3 currently selects the prompt/input (references to delta_diff, diff, and
the Stage 3 prompt/aggregation block), change it to use analysis_diff or to pick
delta_diff only when has_real_delta is true (otherwise fall back to diff) so
Stage 3 won't run a delta-only aggregation for header-only deltas; keep the
existing has_real_delta variable and reuse it.
| BUDGET_NORMAL = 200_000 | ||
| BUDGET_TIGHT = 100_000 | ||
| try: | ||
| return await _attempt(BUDGET_NORMAL) | ||
| except Exception as first_err: | ||
| logger.warning( | ||
| "Stage 3 delta failed at %dK budget: %s — retrying at %dK", | ||
| BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000, | ||
| ) | ||
| return await _attempt(BUDGET_TIGHT) | ||
|
|
||
| response = await self.llm.ainvoke([ | ||
| {"role": "system", "content": QA_DOC_SYSTEM_PROMPT}, | ||
| {"role": "user", "content": prompt}, | ||
| ]) | ||
| return self._extract_text(response) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && find . -name "qa_doc_orchestrator.py" -type fRepository: rostilos/CodeCrow
Length of output: 114
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && wc -l ./src/service/qa_documentation/qa_doc_orchestrator.pyRepository: rostilos/CodeCrow
Length of output: 118
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && sed -n '620,660p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 1848
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && sed -n '600,633p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 1708
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && sed -n '570,610p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 1905
🏁 Script executed:
cd python-ecosystem/inference-orchestrator && sed -n '587,655p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 3243
Remove the dead code block at the end of _execute_stage_3_delta.
The code at lines 644-648 is unreachable since both the try and except branches return. Additionally, the prompt variable referenced in this block is undefined at that scope—it only exists within the nested _attempt function. This causes an F821 lint error that must be resolved by deleting the dead code.
Suggested cleanup
try:
return await _attempt(BUDGET_NORMAL)
except Exception as first_err:
logger.warning(
"Stage 3 delta failed at %dK budget: %s — retrying at %dK",
BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000,
)
return await _attempt(BUDGET_TIGHT)
-
- response = await self.llm.ainvoke([
- {"role": "system", "content": QA_DOC_SYSTEM_PROMPT},
- {"role": "user", "content": prompt},
- ])
- return self._extract_text(response)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| BUDGET_NORMAL = 200_000 | |
| BUDGET_TIGHT = 100_000 | |
| try: | |
| return await _attempt(BUDGET_NORMAL) | |
| except Exception as first_err: | |
| logger.warning( | |
| "Stage 3 delta failed at %dK budget: %s — retrying at %dK", | |
| BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000, | |
| ) | |
| return await _attempt(BUDGET_TIGHT) | |
| response = await self.llm.ainvoke([ | |
| {"role": "system", "content": QA_DOC_SYSTEM_PROMPT}, | |
| {"role": "user", "content": prompt}, | |
| ]) | |
| return self._extract_text(response) | |
| BUDGET_NORMAL = 200_000 | |
| BUDGET_TIGHT = 100_000 | |
| try: | |
| return await _attempt(BUDGET_NORMAL) | |
| except Exception as first_err: | |
| logger.warning( | |
| "Stage 3 delta failed at %dK budget: %s — retrying at %dK", | |
| BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000, | |
| ) | |
| return await _attempt(BUDGET_TIGHT) |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 637-637: Do not catch blind exception: Exception
(BLE001)
[error] 646-646: Undefined name prompt
(F821)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`
around lines 633 - 648, In _execute_stage_3_delta remove the unreachable dead
code at the end that calls self.llm.ainvoke and returns
self._extract_text(response); both the try and except paths already return and
the local prompt variable referenced there is undefined in that scope (causing
F821). Delete the response = await self.llm.ainvoke([...]) and the subsequent
return self._extract_text(response) lines so the function ends after the
existing try/except/_attempt logic and no undefined symbols remain.
| SEVERITY_RANK = {"HIGH": 3, "MEDIUM": 2, "LOW": 1, "INFO": 0} | ||
|
|
||
| def _sort_key(issue: Dict[str, Any]): | ||
| sev = SEVERITY_RANK.get((issue.get("severity") or "").upper(), 0) | ||
| version = issue.get("prVersion") or 0 | ||
| return (sev, version) | ||
|
|
||
| # Sort highest-priority first so we keep the best representative | ||
| sorted_issues = sorted(issues, key=_sort_key, reverse=True) | ||
|
|
||
| # Tier 1: Location fingerprint (file + lineHash + category) | ||
| seen_locations: Set[str] = set() | ||
| tier1_result: List[Dict[str, Any]] = [] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and get basic info
fd orchestrator.py | head -5Repository: rostilos/CodeCrow
Length of output: 330
🏁 Script executed:
# Read the imports at the top of the file to check current typing imports
head -50 python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 1945
🏁 Script executed:
# Read lines 455-467 to verify the code matches and check for Set usage
sed -n '450,470p' python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | cat -nRepository: rostilos/CodeCrow
Length of output: 966
🏁 Script executed:
# Check if Set is imported anywhere in the file
grep -n "from typing import\|import Set" python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | head -20Repository: rostilos/CodeCrow
Length of output: 118
Add Set to the typing imports.
The code at line 466 uses seen_locations: Set[str] = set(), but Set is not imported. The current import statement (line 11) only includes Dict, Any, List, Optional, Callable. This will cause a NameError at runtime.
🔧 Proposed fix
-from typing import Dict, Any, List, Optional, Callable
+from typing import Dict, Any, List, Optional, Callable, Set🧰 Tools
🪛 Ruff (0.15.7)
[error] 466-466: Undefined name Set
(F821)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`
around lines 455 - 467, The code uses the typing symbol Set (seen_locations:
Set[str] = set()) but Set is not imported; update the module's typing import
list to include Set (alongside Dict, Any, List, Optional, Callable) so
references like seen_locations and any other Set-typed annotations resolve
correctly; ensure you only modify the import line(s) at the top of
orchestrator.py to add Set.
| # Tier 2: Semantic similarity within same file (title-based) | ||
| from collections import OrderedDict | ||
| by_file: OrderedDict[str, List[Dict[str, Any]]] = OrderedDict() | ||
| for issue in tier1_result: | ||
| fp = issue.get("file") or issue.get("filePath") or "_unknown_" | ||
| by_file.setdefault(fp, []).append(issue) | ||
|
|
||
| final: List[Dict[str, Any]] = [] | ||
| for file_path, file_issues in by_file.items(): | ||
| kept: List[Dict[str, Any]] = [] | ||
| for issue in file_issues: | ||
| title = (issue.get("title") or issue.get("reason") or "").lower().strip() | ||
| is_dup = False | ||
| for existing in kept: | ||
| existing_title = (existing.get("title") or existing.get("reason") or "").lower().strip() | ||
| if title and existing_title: | ||
| ratio = difflib.SequenceMatcher(None, title, existing_title).ratio() | ||
| if ratio >= 0.75: | ||
| is_dup = True | ||
| break | ||
| if not is_dup: | ||
| kept.append(issue) | ||
| final.extend(kept) |
There was a problem hiding this comment.
The semantic-dedup pass can drop distinct findings from the same file.
Lines 490-503 treat any same-file issues with a title/reason similarity of >= 0.75 as duplicates, even when they point to different locations. That will collapse common findings like two separate null-dereference warnings in one file, and the dropped issue never reaches reconciliation. Please gate tier 2 on extra location evidence before removing it.
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 490-490: Loop control variable file_path not used within loop body
(B007)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`
around lines 482 - 504, The current semantic dedupe in the block processing
tier1_result (using by_file, kept, title, existing_title and
difflib.SequenceMatcher) removes issues with similar titles even when they point
to different locations; update the dedupe logic so that before marking is_dup
true you verify location evidence: require matching or overlapping location
fields (e.g., compare issue.get("startLine")/issue.get("endLine") or
issue.get("line")/issue.get("span")/issue.get("filePath") if present) and only
treat as duplicates when both title similarity >= 0.75 AND the location is
identical or overlaps (or when location is missing for both); otherwise keep
both issues so distinct findings in the same file are not dropped.
| if description: | ||
| desc_to_show = description | ||
| if ac_text: | ||
| # Remove the AC block from the description to avoid duplication | ||
| ac_start = description.lower().find("acceptance criteria") | ||
| if ac_start == -1: | ||
| ac_start = description.lower().find("definition of done") | ||
| if ac_start > 0: | ||
| desc_to_show = description[:ac_start].strip() | ||
| elif ac_start == 0: | ||
| # AC was at the very beginning — show everything after it | ||
| desc_to_show = "" |
There was a problem hiding this comment.
AC removal logic may not match all patterns that extraction handles.
The extraction uses _AC_HEADER_REGEX which matches various formats (h1-h6 headers, bold markers, etc.), but the removal logic uses a simple find("acceptance criteria") which won't correctly locate patterns like h3. Acceptance Criteria or **Acceptance Criteria**.
This can cause the AC block to appear twice — once in the dedicated AC section and again in the description.
🔧 Proposed fix: reuse the regex match position
# Full description (truncated, without the AC block to avoid duplication)
if description:
desc_to_show = description
if ac_text:
# Remove the AC block from the description to avoid duplication
- ac_start = description.lower().find("acceptance criteria")
- if ac_start == -1:
- ac_start = description.lower().find("definition of done")
- if ac_start > 0:
- desc_to_show = description[:ac_start].strip()
- elif ac_start == 0:
- # AC was at the very beginning — show everything after it
- desc_to_show = ""
+ # Use the same regex for consistent matching
+ ac_match = _AC_HEADER_REGEX.search(description)
+ if ac_match:
+ ac_start = ac_match.start()
+ if ac_start > 0:
+ desc_to_show = description[:ac_start].strip()
+ else:
+ # AC was at the very beginning — show everything after it
+ desc_to_show = ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`
around lines 134 - 145, The current AC-removal uses a plain substring search
which misses headers matched by _AC_HEADER_REGEX; update the logic in the block
that computes desc_to_show (the code that checks description, ac_text and
currently uses description.lower().find(...)) to run _AC_HEADER_REGEX on
description (preserving case/flags used by the extractor), and if a match is
found use match.start() to slice out the AC section (or match.end() to drop a
leading header), otherwise fall back to the existing substring checks; ensure
you reference description, ac_text, desc_to_show and _AC_HEADER_REGEX so the
removal reuses the same regex positions the extractor uses and prevents
duplicate AC content.
# Conflicts: # README.md
_deduplicate_previous_issuesto handle deduplication based on location fingerprint and semantic similarity.fix: Allow optional AI base URL in LLM factory
ai_base_urlparameter when creating an LLM instance.feat: Introduce dependency-aware batching for changed files
build_dependency_aware_batchesto create batches of changed files while considering their relationships.chore: Update QA documentation prompts for clarity and structure
new: Create Task Context Builder utility
task_context_builder.pyto extract structured context from Jira/task-management data.Summary by CodeRabbit
New Features
Documentation