Skip to content

feat: Enhance multi-stage review orchestrator with issue deduplication#177

Merged
rostilos merged 2 commits intomainfrom
1.5.7-rc
Mar 31, 2026
Merged

feat: Enhance multi-stage review orchestrator with issue deduplication#177
rostilos merged 2 commits intomainfrom
1.5.7-rc

Conversation

@rostilos
Copy link
Copy Markdown
Owner

@rostilos rostilos commented Mar 27, 2026

  • Implemented a pre-deduplication step in the MultiStageReviewOrchestrator to eliminate near-duplicate issues before sending to the LLM.
  • Added a new static method _deduplicate_previous_issues to handle deduplication based on location fingerprint and semantic similarity.
  • Updated logging to reflect the number of duplicates removed during the reconciliation process.

fix: Allow optional AI base URL in LLM factory

  • Modified the LLMFactory to accept an optional ai_base_url parameter when creating an LLM instance.

feat: Introduce dependency-aware batching for changed files

  • Added a new function build_dependency_aware_batches to create batches of changed files while considering their relationships.
  • Implemented logic to estimate token costs based on file content and diff sizes for efficient batching.

chore: Update QA documentation prompts for clarity and structure

  • Revised multiple QA documentation prompts to improve clarity and enforce strict formatting rules.
  • Added new sections for task context and acceptance criteria in the documentation prompts.
  • Enhanced instructions for generating QA documentation to ensure user-facing language is used.

new: Create Task Context Builder utility

  • Introduced a new utility task_context_builder.py to extract structured context from Jira/task-management data.
  • Implemented functions to parse acceptance criteria and build a rich context block for LLM prompts.

Summary by CodeRabbit

  • New Features

    • Support for any OpenAI‑compatible AI endpoints (custom base URL) and fetching models from custom endpoints
    • Output language selection for QA auto-documentation
    • Multi‑stage QA documentation pipeline with richer enrichment and incremental/delta support
    • Branch‑scoped issue tracking and deduplication to reduce duplicate findings
    • Same‑PR re‑run detection and persistent QA generation state
  • Documentation

    • Updated feature text to highlight OpenAI‑compatible endpoint support and example providers

- Implemented a pre-deduplication step in the MultiStageReviewOrchestrator to eliminate near-duplicate issues before sending to the LLM.
- Added a new static method `_deduplicate_previous_issues` to handle deduplication based on location fingerprint and semantic similarity.
- Updated logging to reflect the number of duplicates removed during the reconciliation process.

fix: Allow optional AI base URL in LLM factory

- Modified the LLMFactory to accept an optional `ai_base_url` parameter when creating an LLM instance.

feat: Introduce dependency-aware batching for changed files

- Added a new function `build_dependency_aware_batches` to create batches of changed files while considering their relationships.
- Implemented logic to estimate token costs based on file content and diff sizes for efficient batching.

chore: Update QA documentation prompts for clarity and structure

- Revised multiple QA documentation prompts to improve clarity and enforce strict formatting rules.
- Added new sections for task context and acceptance criteria in the documentation prompts.
- Enhanced instructions for generating QA documentation to ensure user-facing language is used.

new: Create Task Context Builder utility

- Introduced a new utility `task_context_builder.py` to extract structured context from Jira/task-management data.
- Implemented functions to parse acceptance criteria and build a rich context block for LLM prompts.
@codecrow-local
Copy link
Copy Markdown

🔄 CodeCrow is analyzing this PR...

This may take a few minutes depending on the size of the changes.
This comment will be updated with the analysis results when complete.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c1238dc-3460-4ff8-8a31-460abd707421

📥 Commits

Reviewing files that changed from the base of the PR and between 23fbe00 and 3ad6cf7.

📒 Files selected for processing (1)
  • README.md

📝 Walkthrough

Walkthrough

Added OpenAI‑compatible endpoint support (aiBaseUrl) across frontend, Java, and Python stacks; introduced SSRF-safe URL validation; refactored QA auto-documentation into a multi-stage orchestrator with server-side QaDocState and enrichment/delta-diff flows; added shared VCS diff utilities and branch-scoped issue deduplication.

Changes

Cohort / File(s) Summary
Docs & Frontend
README.md, frontend
README updated to advertise OpenAI‑compatible endpoints; frontend submodule bumped to new commit hash.
AI request surface (Java)
...aiclient/AiAnalysisClient.java, ...aiclient/AiCommandClient.java, ...dto/request/ai/AiAnalysisRequest*.java
Added aiBaseUrl to request payloads/records and to AiAnalysisRequest interface/impl; tests adjusted for new positional parameter.
AI provider model & DTOs (Java)
...model/ai/AIProviderKey.java, ...model/ai/AIConnection.java, ...dto/ai/AIConnectionDTO.java
Added OPENAI_COMPATIBLE provider, persisted baseUrl on AIConnection, and exposed baseUrl in DTO.
Web API & validation (Java)
...webserver/ai/controller/LlmModelController.java, ...webserver/ai/service/AIConnectionService.java, ...dto/request/CreateAIConnectionRequest.java, ...UpdateAiConnectionRequest.java
New POST /api/llm-models/fetch-custom to enumerate models from a custom baseUrl with SSRF validation; service validates/normalizes/persists baseUrl only for OPENAI_COMPATIBLE.
SSRF validators
...core/util/SsrfSafeUrlValidator.java, ...llm/ssrf_safe_transport.py
Implemented SSRF-safe URL validation in Java and Python, plus HTTP client factories (Python) that enforce DNS/IP checks and optional ALLOW_PRIVATE_ENDPOINTS override.
Python LLM factory & DTOs
...llm/llm_factory.py, ...model/dtos.py, ...service/command/command_service.py, ...service/review/review_service.py
Added OPENAI_COMPATIBLE provider and ai_base_url handling; updated DTOs to include aiBaseUrl and pass base URL into LLM creation.
QA config & DTOs
...model/project/config/QaAutoDocConfig.java, ...dto/taskmanagement/QaAutoDocConfigRequest.java, ...taskmanagement/service/TaskManagementService.java
Added outputLanguage to config/requests and effectiveOutputLanguage() normalization.
QaDoc state & repo (Java DB + JPA)
...model/qadoc/QaDocState.java, ...repository/qadoc/QaDocStateRepository.java, ...db/migration/...V2.6.0__qa_doc_state.sql
Added QaDocState JPA entity and repository; DB migration to create qa_doc_state table.
QaDoc pipeline (Java)
...qadoc/QaDocGenerationContext.java, ...QaDocCommandProcessor.java, ...QaAutoDocListener.java
Introduced immutable generation context, enrichment/delta flow, VCS credential extraction, state loading/upsert, and many control-flow changes to orchestrate generation and posting.
QaDoc generation service (Java)
...QaDocGenerationService.java
Generation API changed to accept QaDocGenerationContext; now includes ai_base_url in payload and increased timeout.
QaDoc orchestration & utilities (Python)
...service/qa_documentation/base_orchestrator.py, ...qa_doc_orchestrator.py, ...qa_doc_service.py
Added BaseOrchestrator and QaDocOrchestrator multi-stage pipeline; QaDocService delegates to orchestrator; new event helpers, indexing, batching, diff filtering, and multi‑stage prompts.
Prompting, batching, enrichment (Python utils)
...utils/prompts/constants_qa_doc.py, ...utils/dependency_graph.py, ...utils/task_context_builder.py
Large prompt constant updates for ULTRATHINKING multi-stage flow; dependency-aware batching and task-context extraction utilities added.
VCS diff utilities & usage
...util/VcsDiffUtils.java, ...services/*/*AiClientService.java (GitHub/GitLab/Bitbucket)
Added VcsDiffUtils (fetch/filter/thresholds); replaced duplicated diff-fetch/threshold logic in provider services to use shared utility.
Branch issue deduplication
...service/branch/BranchIssueMappingService.java
Added branch-scoped query and location-fingerprint deduplication for unresolved issues; pre-populates fingerprint set from DB.
Review deduplication (Python)
...service/review/orchestrator/orchestrator.py
Added _deduplicate_previous_issues to remove near-duplicate prior issues before batching.
Module exports & manifest
java-ecosystem/libs/core/src/main/java/module-info.java, ...ast-parser/META-INF/MANIFEST.MF
Exported qadoc packages and added Automatic-Module-Name for ast-parser artifact.
SQL migration: OPENAI_COMPATIBLE
...db/migration/managed/V2.5.0__add_openai_compatible_provider.sql
Added base_url column and constraints ensuring base_url presence only for OPENAI_COMPATIBLE provider and updated provider_key check.
Tests
...test/java/.../*Test.java (aiclient, branch, ai, qadoc, core)
Updated tests to account for new aiBaseUrl constructor positions, branch-scoped repository stubs, new config field, and revised service/method signatures.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant WebServer as Web Server
    participant QaCmd as QaDocCommandProcessor
    participant VCS as VCS API
    participant Enr as Enrichment Service
    participant QaState as QaDocState Repo
    participant Orchestrator as QaDocOrchestrator
    participant LLM as LLM Factory/Client
    participant Jira as Jira API

    User->>WebServer: PR webhook -> trigger QA doc
    WebServer->>QaCmd: build and validate request
    QaCmd->>VCS: fetch PR diff & changed files
    QaCmd->>Enr: fetch enrichment for files
    QaCmd->>QaState: load existing QaDocState(project, task)
    QaCmd->>VCS: fetch delta-diff (lastCommit->current)
    QaCmd->>Orchestrator: run(QaDocGenerationContext with diff, delta, enrichment, creds)
    Orchestrator->>LLM: Stage1 batch calls (per-file)
    Orchestrator->>LLM: Stage2 cross-impact
    Orchestrator->>LLM: Stage3 aggregation/delta
    Orchestrator-->>QaCmd: documentation result
    QaCmd->>Jira: post/update comment
    QaCmd->>QaState: upsert state(commit, analysisId, prNumber)
    QaCmd-->>WebServer: respond success
Loading
sequenceDiagram
    participant Client
    participant API as LlmModelController
    participant Validator as SsrfSafeUrlValidator
    participant DNS as DNS Resolver
    participant DB as Database

    Client->>API: POST /api/llm-models/fetch-custom { baseUrl, apiKey? }
    API->>Validator: validate(baseUrl)
    Validator->>DNS: resolve(hostname)
    DNS-->>Validator: IPs
    Validator->>Validator: check IPs not private/reserved
    alt valid
        Validator-->>API: ok
        API->>DB: (no persistence here) call external baseUrl/v1/models and return models
    else invalid
        Validator-->>API: throws IllegalArgumentException
        API-->>Client: 400 Bad Request (validator message)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through baseUrls, safe and neat,
OpenAI‑compatible endpoints—what a treat!
I gathered diffs, enrichment, state so true,
Three-stage thinking baked into the brew.
A doc persisted, comments posted bright—hooray, let's chew!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: enhancing the multi-stage review orchestrator with issue deduplication. While the changeset includes many other improvements across multiple systems, the deduplication feature represents a key enhancement to the core orchestrator component and aligns well with the objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 1.5.7-rc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java (1)

149-158: ⚠️ Potential issue | 🟡 Minor

Update location cache in the duplicate-save fallback path.

If saveAndFlush fails on a duplicate, only branchContentFingerprints is updated. In the same run, a title-variant issue at the same location can still pass Tier 2.5 and be reprocessed.

💡 Proposed patch
                 } catch (DataIntegrityViolationException e) {
                     // Safety net: concurrent insert or edge-case fingerprint collision
                     log.warn("Duplicate content_fingerprint for branch {} file {} — skipping (fp={})",
                             branch.getId(), filePath,
                             bi.getContentFingerprint() != null ? bi.getContentFingerprint().substring(0, 12) + "..." : "null");
                     skipped++;
                     if (bi.getContentFingerprint() != null) {
                         branchContentFingerprints.add(bi.getContentFingerprint());
                     }
+                    if (bi.getLineHash() != null) {
+                        String locFp = bi.getFilePath() + ":" + bi.getLineHash() + ":"
+                                + (bi.getIssueCategory() != null ? bi.getIssueCategory().name() : "UNKNOWN");
+                        branchLocationFingerprints.add(locFp);
+                    }
                     continue;
                 }

Also applies to: 166-170

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`
around lines 149 - 158, When catching DataIntegrityViolationException in
BranchIssueMappingService (the saveAndFlush duplicate-save fallback), also
update the in-memory location cache used to track saved issue locations in
addition to adding to branchContentFingerprints; locate the catch block handling
DataIntegrityViolationException around saveAndFlush and ensure the same cache
update (the set/map that prevents reprocessing of a file/line location — e.g.,
the branchLocationCache/processedLocations structure used elsewhere in the
method) is updated there; apply the identical fix to the second duplicate-save
catch path (the other DataIntegrityViolationException block around lines
166-170).
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java (1)

50-63: ⚠️ Potential issue | 🟠 Major

Persist the normalized base URL, not the raw request string.

validateBaseUrl() trims and strips the URL at Lines 121-125, but Lines 61-63 and 93-94 still save request.baseUrl unchanged. A value like " https://host/ " will pass validation and then be persisted with the extra whitespace/slash, which breaks the normalization guarantee and can later fail outbound calls.

🔧 Proposed fix
-        validateBaseUrl(request.providerKey, request.baseUrl);
+        String normalizedBaseUrl = normalizeAndValidateBaseUrl(
+                request.providerKey, request.baseUrl);

         AIConnection newAiConnection = new AIConnection();
         String apiKeyEncrypted = tokenEncryptionService.encrypt(request.apiKey);

         newAiConnection.setWorkspace(ws);
         newAiConnection.setName(request.name);
         newAiConnection.setProviderKey(request.providerKey);
         newAiConnection.setAiModel(request.aiModel);
         newAiConnection.setApiKeyEncrypted(apiKeyEncrypted);
-        newAiConnection.setBaseUrl(
-                request.providerKey == AIProviderKey.OPENAI_COMPATIBLE ? request.baseUrl : null
-        );
+        newAiConnection.setBaseUrl(normalizedBaseUrl);
@@
         AIProviderKey effectiveProvider = request.providerKey != null
                 ? request.providerKey : connection.getProviderKey();
         if (effectiveProvider == AIProviderKey.OPENAI_COMPATIBLE) {
             String effectiveBaseUrl = request.baseUrl != null ? request.baseUrl : connection.getBaseUrl();
-            validateBaseUrl(effectiveProvider, effectiveBaseUrl);
-            if (request.baseUrl != null) {
-                connection.setBaseUrl(request.baseUrl);
-            }
+            connection.setBaseUrl(
+                    normalizeAndValidateBaseUrl(effectiveProvider, effectiveBaseUrl)
+            );
         } else {
             connection.setBaseUrl(null);
         }
@@
-    private void validateBaseUrl(AIProviderKey providerKey, String baseUrl) {
-        if (providerKey == AIProviderKey.OPENAI_COMPATIBLE) {
-            if (baseUrl == null || baseUrl.isBlank()) {
-                throw new IllegalArgumentException(
-                        "Base URL is required for OpenAI Compatible provider");
-            }
-            // Strip trailing slash for consistency
-            String normalized = baseUrl.strip();
-            if (normalized.endsWith("/")) {
-                normalized = normalized.substring(0, normalized.length() - 1);
-            }
-            SsrfSafeUrlValidator.validate(normalized);
-        }
+    private String normalizeAndValidateBaseUrl(AIProviderKey providerKey, String baseUrl) {
+        if (providerKey != AIProviderKey.OPENAI_COMPATIBLE) {
+            return null;
+        }
+        if (baseUrl == null || baseUrl.isBlank()) {
+            throw new IllegalArgumentException("Base URL is required for OpenAI Compatible provider");
+        }
+        String normalized = baseUrl.strip().replaceAll("/+$", "");
+        SsrfSafeUrlValidator.validate(normalized);
+        return normalized;
     }

Also applies to: 87-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java`
around lines 50 - 63, validateBaseUrl currently normalizes the URL but the code
still persists request.baseUrl raw; change the call to capture and use the
normalized value (e.g. String normalizedBaseUrl =
validateBaseUrl(request.providerKey, request.baseUrl)) and then call
newAiConnection.setBaseUrl(request.providerKey ==
AIProviderKey.OPENAI_COMPATIBLE ? normalizedBaseUrl : null); also replace other
assignments in the same method/region (the other setBaseUrl usage referenced in
the comment) to persist the normalizedBaseUrl instead of request.baseUrl so
stored values match the validator's output.
🧹 Nitpick comments (16)
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java (1)

66-71: Extract a shared location-fingerprint builder.

The same fingerprint string assembly appears in multiple places. A single helper would reduce drift risk and keep dedup semantics consistent.

Also applies to: 130-132, 167-169

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`
around lines 66 - 71, Extract a single helper method (e.g.,
buildLocationFingerprint or buildLocationFingerprintForIssue) in
BranchIssueMappingService that accepts the inputs used to assemble the
fingerprint (preferably the BranchIssue instance or its filePath, lineHash, and
issueCategory) and returns the same string currently built at the three sites
(the filePath + ":" + lineHash + ":" + (issueCategory != null ?
issueCategory.name() : "UNKNOWN")); replace the inline assembly at the locations
referencing branchLocationFingerprints (the blocks around the current
fingerprint creation at lines referenced in the review) with calls to this new
helper and reuse it everywhere to ensure consistent dedup semantics.
python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py (1)

71-71: Rename the unused loop variable.

Ruff will keep flagging family here.

♻️ Minimal cleanup
-    for family, _type, _proto, _canonname, sockaddr in infos:
+    for _family, _type, _proto, _canonname, sockaddr in infos:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py` at
line 71, The for-loop unpacks infos into variables including an unused variable
named "family" which Ruff flags; rename it to a conventional unused name (e.g.,
"_family" or "_" ) in the loop header that iterates over "infos" (for _family,
_type, _proto, _canonname, sockaddr in infos) so the linter stops reporting an
unused variable while leaving the rest of the unpacking and logic (variables
_type, _proto, _canonname, sockaddr) unchanged.
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java (1)

27-31: Clarify the small-delta rule in the shared API.

MIN_DELTA_DIFF_SIZE and shouldEscalateToFull() currently imply that sub-threshold deltas stay incremental, but the GitHub/GitLab/Bitbucket services in this change still force that case back to FULL. Align the shared helper/docs with the policy the callers actually implement before this API gets reused elsewhere.

Also applies to: 98-105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java`
around lines 27 - 31, The constant MIN_DELTA_DIFF_SIZE and the helper method
shouldEscalateToFull() claim sub-threshold deltas always remain incremental, but
some callers (GitHub/GitLab/Bitbucket services) force FULL escalation; update
the shared API to reflect that policy by changing the Javadoc/comments on
MIN_DELTA_DIFF_SIZE and shouldEscalateToFull() in VcsDiffUtils to state that
values below the threshold are treated as "trivially small" and qualify for
incremental mode by default, but callers may override and escalate to FULL (and
include a note naming the services that do this), and consider adding a TODO or
parameter in shouldEscalateToFull() if you want caller-controlled escalation in
future so consumers know the helper does not enforce caller policies.
java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java (1)

73-86: Consider testing baseUrl mapping in fromAiConnection tests.

The fromAiConnection() tests don't verify the new baseUrl field mapping. While the record constructor tests cover baseUrl, the factory method conversion path remains untested.

💡 Suggested enhancement
         `@Test`
         `@DisplayName`("should convert AIConnection with all fields")
         void shouldConvertWithAllFields() {
             AIConnection connection = new AIConnection();
             setField(connection, "id", 1L);
             connection.setName("Production AI");
             setField(connection, "providerKey", AIProviderKey.ANTHROPIC);
             setField(connection, "aiModel", "claude-3-opus");
+            setField(connection, "baseUrl", "https://custom.api.com");
 
             AIConnectionDTO dto = AIConnectionDTO.fromAiConnection(connection);
 
             assertThat(dto.id()).isEqualTo(1L);
             assertThat(dto.name()).isEqualTo("Production AI");
             assertThat(dto.providerKey()).isEqualTo(AIProviderKey.ANTHROPIC);
             assertThat(dto.aiModel()).isEqualTo("claude-3-opus");
+            assertThat(dto.baseUrl()).isEqualTo("https://custom.api.com");
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java`
around lines 73 - 86, The AIConnectionDTO.fromAiConnection conversion tests are
missing verification of the new baseUrl mapping; update the
shouldConvertWithAllFields test to set the baseUrl on the source AIConnection
(e.g., connection.setBaseUrl("https://api.example")) and assert that
AIConnectionDTO.fromAiConnection(connection).baseUrl() equals the same value,
ensuring AIConnection, AIConnectionDTO.fromAiConnection and
AIConnectionDTO.baseUrl() are exercised.
java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql (1)

14-15: Comment claims HTTPS requirement but constraint doesn't enforce it.

The column comment states "Must be HTTPS" but the ai_connection_base_url_check constraint only validates that base_url is non-null and non-empty for OPENAI_COMPATIBLE. If HTTPS is a security requirement, consider enforcing it at the database level.

🔐 Optional: Enforce HTTPS in constraint
 ALTER TABLE ai_connection ADD CONSTRAINT ai_connection_base_url_check
     CHECK (
-        (provider_key = 'OPENAI_COMPATIBLE' AND base_url IS NOT NULL AND base_url <> '')
+        (provider_key = 'OPENAI_COMPATIBLE' AND base_url IS NOT NULL AND base_url <> '' AND base_url LIKE 'https://%')
         OR
         (provider_key <> 'OPENAI_COMPATIBLE' AND base_url IS NULL)
     );

Alternatively, if HTTPS validation is handled at the application layer, update the comment to remove the "Must be HTTPS" statement to avoid confusion.

Also applies to: 25-30

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql`
around lines 14 - 15, The comment on ai_connection.base_url says "Must be HTTPS"
but the ai_connection_base_url_check constraint doesn't enforce HTTPS; either
update the constraint to require HTTPS for OPENAI_COMPATIBLE (modify
ai_connection_base_url_check to include a condition like provider !=
'OPENAI_COMPATIBLE' OR base_url ILIKE 'https://%') or remove the "Must be HTTPS"
phrase from the COMMENT ON COLUMN ai_connection.base_url so the DB doc matches
enforcement; locate ai_connection_base_url_check and the COMMENT ON COLUMN
ai_connection.base_url in the migration and apply one of these two changes so
the comment and constraint are consistent.
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java (1)

19-21: Consider adding @Transactional to the @Modifying query.

The deleteByProjectId method uses @Modifying but lacks @Transactional. While this may work if the caller provides a transaction, it's a best practice to explicitly annotate modifying repository methods to ensure they always run within a transaction context.

🔧 Suggested fix
+import org.springframework.transaction.annotation.Transactional;
+
 `@Repository`
 public interface QaDocStateRepository extends JpaRepository<QaDocState, Long> {
 
     Optional<QaDocState> findByProjectIdAndTaskId(Long projectId, String taskId);
 
     boolean existsByProjectIdAndTaskId(Long projectId, String taskId);
 
+    `@Transactional`
     `@Modifying`
     `@Query`("DELETE FROM QaDocState q WHERE q.project.id = :projectId")
     void deleteByProjectId(`@Param`("projectId") Long projectId);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java`
around lines 19 - 21, The `@Modifying` repository method deleteByProjectId in
QaDocStateRepository should be annotated with `@Transactional` to guarantee it
executes inside a transaction; update the QaDocStateRepository interface by
adding the javax.transaction (or Spring's
org.springframework.transaction.annotation) `@Transactional` annotation on the
deleteByProjectId method (or class) so the DELETE query always runs within a
transaction context.
java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java (1)

548-560: This test doesn't actually cover 403.

The display name says 401/403, but Line 552 only injects a 401 failure. A regression in the 403 branch would still pass here, so either parameterize the test over both statuses or rename it to 401-only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java`
around lines 548 - 560, The test should actually cover both 401 and 403 cases:
change shouldReturnErrorWhenJiraAuthFails to be parameterized over the status
codes (e.g., use `@ParameterizedTest` with `@ValueSource` ints {401,403}) or split
into two tests (e.g., shouldReturnErrorWhenJira401AuthFails and
shouldReturnErrorWhenJira403AuthFails); then in the body replace the hardcoded
TaskManagementException( "Authentication failed", 401, "" ) with
TaskManagementException("Authentication failed", status, "") using the
parameter, keep the same assertions on processor.process(payload, project,
eventConsumer, Map.of()), and verifyNoInteractions(qaDocGenerationService) so
both 401 and 403 paths are validated.
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java (1)

33-34: changedFilePaths is still mutable despite the “immutable context” contract.

The builder passes the caller’s List reference straight into the record, so later mutations can change the built context from the outside. A defensive List.copyOf(...) here would make the immutability claim true for this field.

Also applies to: 96-115

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java`
around lines 33 - 34, QaDocGenerationContext currently accepts the caller's
mutable List for the changedFilePaths field (and other list fields built around
lines 96-115), violating the "immutable context" contract; update the record
construction (or the QaDocGenerationContext.Builder) to defensively copy
incoming Lists using List.copyOf(...) before assigning them to the record fields
so the built QaDocGenerationContext (and fields like changedFilePaths and the
other list members referenced in the builder) cannot be mutated by callers after
construction.
python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py (1)

22-31: Fullwidth colons are intentional for CJK locale support — consider adding a comment.

The static analysis tool flags (U+FF1A fullwidth colon) as ambiguous. This appears intentional for matching Jira descriptions that may use fullwidth punctuation (common in CJK locales). Adding a brief comment would prevent future maintainers from "fixing" this.

📝 Suggested clarifying comment
 # ── Acceptance-criteria extraction patterns ─────────────────────────
 # Common header patterns found in Jira descriptions (case-insensitive):
 #   - "Acceptance Criteria"
 #   - "AC:" / "ACs:"
 #   - "Definition of Done"
+# Note: Patterns include both ASCII colon (:) and fullwidth colon (:) for CJK locale support.
 _AC_HEADER_PATTERNS = [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`
around lines 22 - 31, Add a short inline comment above _AC_HEADER_PATTERNS
(and/or next to the fullwidth colon characters in the patterns) explaining that
the fullwidth colon ':' (U+FF1A) is intentionally included to support
CJK/Japanese/Chinese Jira descriptions and should not be removed; mention this
is to satisfy locale-specific punctuation in headers like "Acceptance Criteria".
Reference _AC_HEADER_PATTERNS and _AC_HEADER_REGEX so reviewers see why the
fullwidth character is present.
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java (1)

75-91: Consider using JPA lifecycle callbacks for timestamps.

The timestamps are initialized both at field declaration and in the constructor. While this works, JPA lifecycle callbacks (@PrePersist, @PreUpdate) provide a cleaner, centralized approach.

♻️ Using JPA lifecycle callbacks
-    `@Column`(name = "created_at", nullable = false, updatable = false)
-    private OffsetDateTime createdAt = OffsetDateTime.now();
+    `@Column`(name = "created_at", nullable = false, updatable = false)
+    private OffsetDateTime createdAt;

-    `@Column`(name = "updated_at", nullable = false)
-    private OffsetDateTime updatedAt = OffsetDateTime.now();
+    `@Column`(name = "updated_at", nullable = false)
+    private OffsetDateTime updatedAt;

     // ...

     public QaDocState(Project project, String taskId) {
         this.project = project;
         this.taskId = taskId;
-        this.createdAt = OffsetDateTime.now();
-        this.updatedAt = OffsetDateTime.now();
     }

+    `@PrePersist`
+    protected void onCreate() {
+        createdAt = OffsetDateTime.now();
+        updatedAt = OffsetDateTime.now();
+    }
+
+    `@PreUpdate`
+    protected void onUpdate() {
+        updatedAt = OffsetDateTime.now();
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`
around lines 75 - 91, The createdAt and updatedAt timestamps in QaDocState are
set both at field declaration and again in the constructor; replace this ad-hoc
initialization with JPA lifecycle callbacks by removing the default
OffsetDateTime.now() assignments and the constructor timestamp sets, and add
`@PrePersist` and `@PreUpdate` methods (e.g., prePersist() and preUpdate()) in the
QaDocState class that set createdAt and updatedAt appropriately (createdAt and
updatedAt on persist, updatedAt on update) so JPA manages timestamps centrally.
python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py (1)

46-73: Consider a request object to reduce parameter count.

The generate() method has 20+ parameters. While this is a thin coordinator passing data to the orchestrator, a QaDocGenerationRequest dataclass would improve maintainability and make the API easier to evolve.

This is not blocking — the current implementation works and matches the router's request model.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py`
around lines 46 - 73, The generate method on QaDocService has an excessive
parameter list; introduce a QaDocGenerationRequest dataclass (or pydantic model)
to encapsulate all inputs (project_id, project_name, pr_number, issues_found,
files_analyzed, pr_metadata, template_mode, custom_template, task_context,
ai_provider, ai_model, ai_api_key, ai_base_url, diff, delta_diff,
previous_documentation, is_same_pr_rerun, enrichment_data, changed_file_paths,
workspace_slug, repo_slug, source_branch, target_branch, vcs_provider,
output_language) and change the signature of QaDocService.generate(...) to
accept a single request: QaDocGenerationRequest (plus any rare optional args
like cancellation/context if needed); update all call sites (router and
orchestrator invocation inside generate) to construct and pass the new
dataclass, preserving field names and types so behavior remains unchanged.
java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql (2)

16-16: VARCHAR(4096) for PR numbers is pragmatic but has limits.

The comma-separated string can hold ~400-800 PR numbers. For most tasks this is sufficient, but very long-lived tasks with many PRs could hit this limit.

Consider documenting this limit or using TEXT type with the JPA logic handling truncation/rotation if it becomes a concern. This is not blocking for initial release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql`
at line 16, The column documented_pr_numbers is defined as VARCHAR(4096), which
limits stored comma-separated PR IDs and may overflow for very long-lived tasks;
update the migration to use TEXT (or add a schema comment documenting the
~400–800 PR limit) and adjust the JPA/entity handling (where the field mapped to
documented_pr_numbers) to perform truncation/rotation or validation when
persisting so large PR lists are handled safely; alternatively add a migration
comment and unit tests that assert behavior when the PR list approaches/exceeds
the VARCHAR limit.

28-29: Index idx_qa_doc_state_task is redundant with the unique constraint.

PostgreSQL automatically creates a unique B-tree index to enforce uq_qa_doc_state_project_task. The explicit index on the same columns (project_id, task_id) is redundant and consumes additional storage/write overhead.

🗑️ Remove redundant index
 CREATE INDEX IF NOT EXISTS idx_qa_doc_state_project
     ON qa_doc_state(project_id);

-CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task
-    ON qa_doc_state(project_id, task_id);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql`
around lines 28 - 29, The explicit CREATE INDEX for idx_qa_doc_state_task should
be removed because the unique constraint uq_qa_doc_state_project_task on table
qa_doc_state already creates a B-tree index on (project_id, task_id); delete the
statement "CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task ON
qa_doc_state(project_id, task_id);" from the migration, or if you intended a
different index, replace it with the correct column set rather than duplicating
uq_qa_doc_state_project_task.
python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py (1)

69-72: OAuth credentials should use SecretStr for automatic redaction.

Using Pydantic's SecretStr type for oauth_key, oauth_secret, and bearer_token would prevent accidental logging of these sensitive values.

🔒 Use SecretStr for credentials
 from fastapi import APIRouter, Request
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, SecretStr
 from typing import Optional, Dict, Any, List, Literal

 # ...

     # OAuth credentials (for Python-side RAG/VCS access)
-    oauth_key: Optional[str] = None
-    oauth_secret: Optional[str] = None
-    bearer_token: Optional[str] = None
+    oauth_key: Optional[SecretStr] = None
+    oauth_secret: Optional[SecretStr] = None
+    bearer_token: Optional[SecretStr] = None

Then when passing to the service, use .get_secret_value() to unwrap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py`
around lines 69 - 72, The three credential fields oauth_key, oauth_secret, and
bearer_token are currently plain strings and should be typed as
pydantic.SecretStr to enable automatic redaction; change their annotations to
SecretStr (import SecretStr from pydantic), update all call sites that pass
these values to services to safely unwrap with .get_secret_value() (and handle
None if Optional), and adjust any logging to avoid printing the raw secret
objects. Ensure imports and Optional typing remain correct and that services
receive the unwrapped string via .get_secret_value() only when not None.
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java (1)

157-162: Sentinel detection could produce false positives.

The check documentation.contains("could not be generated") might reject legitimate documentation that discusses generation failures or error handling.

Consider a more specific sentinel string that's unlikely to appear in normal documentation, e.g., [CODECROW_GENERATION_FAILED].

🔧 Use a unique sentinel marker

Define a constant that both Python and Java share:

private static final String GENERATION_FAILED_SENTINEL = "[CODECROW_GENERATION_FAILED]";

Then in the guard:

-            if (documentation == null || documentation.isBlank()
-                    || documentation.contains("could not be generated")) {
+            if (documentation == null || documentation.isBlank()
+                    || documentation.contains(GENERATION_FAILED_SENTINEL)) {

Ensure the Python orchestrator uses the same sentinel when generation fails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java`
around lines 157 - 162, Replace the brittle substring check with a dedicated
sentinel: add a private static final String GENERATION_FAILED_SENTINEL =
"[CODECROW_GENERATION_FAILED]" to QaDocGenerationService, and change the guard
to reject documentation when documentation == null || documentation.isBlank() ||
documentation.trim().equals(GENERATION_FAILED_SENTINEL) (use trim+equals to
match exact sentinel); also ensure the Python orchestrator emits the exact same
"[CODECROW_GENERATION_FAILED]" sentinel when generation fails so both sides
agree.
python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py (1)

79-95: The advertised diff fallback is not implemented.

index_pr_files() accepts diff and documents a diff-hunk fallback, but files is populated only from enrichment_lookup. On partial enrichment runs this silently indexes nothing instead of using the fallback path the method promises.

Either build a real diff-based fallback here, or remove the diff parameter/docstring claim so callers do not assume those files were indexed.

Also applies to: 99-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py`
around lines 79 - 95, The method index_pr_files currently only builds files from
enrichment_lookup and ignores the diff parameter, so implement the promised
diff-hunk fallback: when enrichment_data or enrichment_lookup lacks entries for
changed_file_paths, parse the provided diff into per-file hunks and populate the
local files collection (used later in index_pr_files) with lightweight file
objects derived from those hunks (e.g., path, hunk text, and language/metadata
as needed) so those files are indexed; ensure you use the existing
changed_file_paths and diff parameters to drive this fallback and keep existing
behavior when enrichment_data is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.java`:
- Line 23: Tests in AiCommandClientRecordsTest updated constructor args to
include the new aiBaseUrl field but never assert it; add explicit assertions
calling request.aiBaseUrl() in each affected test (the tests that construct the
record and name the variable request) to verify the expected value (e.g., null
or "..." as appropriate for each case), alongside existing assertions so
regressions for the new aiBaseUrl property are caught; locate the record
instantiations in methods within AiCommandClientRecordsTest and add a matching
assertEquals(expected, request.aiBaseUrl()) for each scenario.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`:
- Around line 98-107: getDocumentedPrNumbersSet currently calls Long.parseLong
on every token and will throw NumberFormatException for any
corrupted/non-numeric tokens in documentedPrNumbers; update
getDocumentedPrNumbersSet to defensively parse tokens by trimming and attempting
to parse each token inside a try/catch (or by pre-validating with a numeric
regex), skip tokens that fail to parse instead of letting the exception
propagate, collect the successfully parsed Longs into an unmodifiable Set, and
if a logger (e.g., LOGGER) is available emit a warning with the invalid token(s)
to aid debugging; ensure the method still returns Collections.emptySet() when
documentedPrNumbers is blank or no valid numbers are found.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`:
- Around line 322-336: The delta diff fetch in QaDocCommandProcessor (using
VcsDiffUtils.fetchDeltaDiff which delegates to ops.getCommitRangeDiff) must be
guarded so failures don't abort the whole command; wrap the fetch call (the
block that computes deltaDiff using opsService/httpClient and
VcsDiffUtils.fetchDeltaDiff) in a try/catch that on any exception sets deltaDiff
= null, logs a warning including the exception and context (repoSlug, commit
hashes), and allows processing to continue so the generator falls back to the
full diff; ensure no exception from getCommitRangeDiff or fetchDeltaDiff bubbles
out.
- Around line 293-298: The QaDocState currently stores only lastCommitHash per
(projectId, taskId) which breaks multi-PR runs; change QaDocState to replace
lastCommitHash with a Map<Long,String> prCommitHashes keyed by PR number, update
recordGeneration(String commitHash, Long analysisId, Long prNumber) to store
prCommitHashes.put(prNumber, commitHash), and when loading state via
qaDocStateRepository.findByProjectIdAndTaskId(...) use
prCommitHashes.get(prNumber) to determine the delta base and to evaluate
isSamePrRerun for the current PR instead of the single lastCommitHash; also
adjust any helper methods (e.g., isPrDocumented) to consult the new map.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.java`:
- Around line 355-374: The state is being upserted before the Jira comment is
guaranteed posted, which can mark a PR as documented even if posting fails; move
the call to upsertQaDocState so it executes only after tmClient.postComment or
tmClient.updateComment succeeds (i.e., after the try block that posts/updates
the comment), or implement a two-phase commit where you mark state only on
successful comment response and on failure do not set the documented flag so
isSamePrRerun sees it as not completed; update references around
upsertQaDocState, tmClient.postComment, tmClient.updateComment and isSamePrRerun
accordingly.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java`:
- Around line 15-60: QaDocGenerationContext is a Java record that currently
exposes credentials via the auto-generated toString(); override the record's
toString() to redact sensitive fields (at least oauthSecret and bearerToken, and
consider oauthKey) so logs/traces don't contain plaintext secrets—implement a
custom toString() in QaDocGenerationContext that returns the same contextual
info but replaces those fields with masked values like "<redacted>" or partially
masked strings.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java`:
- Around line 243-252: The payload includes oauth_key, oauth_secret, and
bearer_token but QaDocGenerationService does not forward them to the service
layer: either extend qa_service.generate(...) (or the intermediate method that
calls it) to accept and propagate these credentials from QaDocumentationRequest
(fields oauthKey/oauthSecret/bearerToken) down to the Python RAG/VCS access
code, ensuring secure handling and unit tests; or remove those keys from the
Java payload and the QaDocumentationRequest model if they are unused. Also
confirm ssrf_safe_transport.py’s HTTPS enforcement (the ALLOW_PRIVATE_ENDPOINTS
toggle around line 52) is compatible with your deployment and document or adjust
configuration as needed.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/controller/LlmModelController.java`:
- Around line 122-134: The controller currently instantiates a raw RestTemplate
inside LlmModelController (new RestTemplate()) and uses it to call modelsUrl
with restTemplate.exchange, which bypasses the configured RestTemplate bean with
timeouts; change LlmModelController to accept a RestTemplate (or
RestTemplateBuilder) via constructor injection (or `@Autowired`) and use that
injected instance when calling restTemplate.exchange(modelsUrl, HttpMethod.GET,
entity, Map.class) so the call honors the timeouts from RestTemplateConfig (or
build a timed RestTemplate from the injected RestTemplateBuilder before using
it).

In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py`:
- Around line 36-80: validate_endpoint_url currently only runs at client
creation and does not prevent DNS rebinding at request time; fix by pinning
resolved IPs into the httpx transport used by ChatOpenAI so that the client will
only connect to the validated addresses: in the code paths that create
httpx.Client / httpx.AsyncClient, resolve the hostname using the same logic as
validate_endpoint_url (reusing validate_endpoint_url and _is_safe_ip), store the
chosen sockaddr IP(s) and construct a custom httpx transport or provide a custom
resolver that always returns those IP(s), and ensure any connect hooks
(connect_failed/request) reject attempts to connect to any IP not equal to the
pinned addresses; also apply the same change to the async client creation path
so both sync/async clients are protected against DNS rebinding.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`:
- Around line 202-213: The Stage 3 aggregation choice must use the same
delta-quality gate as earlier: replace the current logic that picks delta_diff
whenever delta_diff is non-empty with the same has_real_delta check used for
analysis_diff (use has_real_delta && '@@' check outcome). Concretely, where
Stage 3 currently selects the prompt/input (references to delta_diff, diff, and
the Stage 3 prompt/aggregation block), change it to use analysis_diff or to pick
delta_diff only when has_real_delta is true (otherwise fall back to diff) so
Stage 3 won't run a delta-only aggregation for header-only deltas; keep the
existing has_real_delta variable and reuse it.
- Around line 633-648: In _execute_stage_3_delta remove the unreachable dead
code at the end that calls self.llm.ainvoke and returns
self._extract_text(response); both the try and except paths already return and
the local prompt variable referenced there is undefined in that scope (causing
F821). Delete the response = await self.llm.ainvoke([...]) and the subsequent
return self._extract_text(response) lines so the function ends after the
existing try/except/_attempt logic and no undefined symbols remain.

In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`:
- Around line 482-504: The current semantic dedupe in the block processing
tier1_result (using by_file, kept, title, existing_title and
difflib.SequenceMatcher) removes issues with similar titles even when they point
to different locations; update the dedupe logic so that before marking is_dup
true you verify location evidence: require matching or overlapping location
fields (e.g., compare issue.get("startLine")/issue.get("endLine") or
issue.get("line")/issue.get("span")/issue.get("filePath") if present) and only
treat as duplicates when both title similarity >= 0.75 AND the location is
identical or overlaps (or when location is missing for both); otherwise keep
both issues so distinct findings in the same file are not dropped.
- Around line 455-467: The code uses the typing symbol Set (seen_locations:
Set[str] = set()) but Set is not imported; update the module's typing import
list to include Set (alongside Dict, Any, List, Optional, Callable) so
references like seen_locations and any other Set-typed annotations resolve
correctly; ensure you only modify the import line(s) at the top of
orchestrator.py to add Set.

In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`:
- Around line 134-145: The current AC-removal uses a plain substring search
which misses headers matched by _AC_HEADER_REGEX; update the logic in the block
that computes desc_to_show (the code that checks description, ac_text and
currently uses description.lower().find(...)) to run _AC_HEADER_REGEX on
description (preserving case/flags used by the extractor), and if a match is
found use match.start() to slice out the AC section (or match.end() to drop a
leading header), otherwise fall back to the existing substring checks; ensure
you reference description, ac_text, desc_to_show and _AC_HEADER_REGEX so the
removal reuses the same regex positions the extractor uses and prevents
duplicate AC content.

---

Outside diff comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`:
- Around line 149-158: When catching DataIntegrityViolationException in
BranchIssueMappingService (the saveAndFlush duplicate-save fallback), also
update the in-memory location cache used to track saved issue locations in
addition to adding to branchContentFingerprints; locate the catch block handling
DataIntegrityViolationException around saveAndFlush and ensure the same cache
update (the set/map that prevents reprocessing of a file/line location — e.g.,
the branchLocationCache/processedLocations structure used elsewhere in the
method) is updated there; apply the identical fix to the second duplicate-save
catch path (the other DataIntegrityViolationException block around lines
166-170).

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java`:
- Around line 50-63: validateBaseUrl currently normalizes the URL but the code
still persists request.baseUrl raw; change the call to capture and use the
normalized value (e.g. String normalizedBaseUrl =
validateBaseUrl(request.providerKey, request.baseUrl)) and then call
newAiConnection.setBaseUrl(request.providerKey ==
AIProviderKey.OPENAI_COMPATIBLE ? normalizedBaseUrl : null); also replace other
assignments in the same method/region (the other setBaseUrl usage referenced in
the comment) to persist the normalizedBaseUrl instead of request.baseUrl so
stored values match the validator's output.

---

Nitpick comments:
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java`:
- Around line 66-71: Extract a single helper method (e.g.,
buildLocationFingerprint or buildLocationFingerprintForIssue) in
BranchIssueMappingService that accepts the inputs used to assemble the
fingerprint (preferably the BranchIssue instance or its filePath, lineHash, and
issueCategory) and returns the same string currently built at the three sites
(the filePath + ":" + lineHash + ":" + (issueCategory != null ?
issueCategory.name() : "UNKNOWN")); replace the inline assembly at the locations
referencing branchLocationFingerprints (the blocks around the current
fingerprint creation at lines referenced in the review) with calls to this new
helper and reuse it everywhere to ensure consistent dedup semantics.

In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java`:
- Around line 27-31: The constant MIN_DELTA_DIFF_SIZE and the helper method
shouldEscalateToFull() claim sub-threshold deltas always remain incremental, but
some callers (GitHub/GitLab/Bitbucket services) force FULL escalation; update
the shared API to reflect that policy by changing the Javadoc/comments on
MIN_DELTA_DIFF_SIZE and shouldEscalateToFull() in VcsDiffUtils to state that
values below the threshold are treated as "trivially small" and qualify for
incremental mode by default, but callers may override and escalate to FULL (and
include a note naming the services that do this), and consider adding a TODO or
parameter in shouldEscalateToFull() if you want caller-controlled escalation in
future so consumers know the helper does not enforce caller policies.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`:
- Around line 75-91: The createdAt and updatedAt timestamps in QaDocState are
set both at field declaration and again in the constructor; replace this ad-hoc
initialization with JPA lifecycle callbacks by removing the default
OffsetDateTime.now() assignments and the constructor timestamp sets, and add
`@PrePersist` and `@PreUpdate` methods (e.g., prePersist() and preUpdate()) in the
QaDocState class that set createdAt and updatedAt appropriately (createdAt and
updatedAt on persist, updatedAt on update) so JPA manages timestamps centrally.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java`:
- Around line 19-21: The `@Modifying` repository method deleteByProjectId in
QaDocStateRepository should be annotated with `@Transactional` to guarantee it
executes inside a transaction; update the QaDocStateRepository interface by
adding the javax.transaction (or Spring's
org.springframework.transaction.annotation) `@Transactional` annotation on the
deleteByProjectId method (or class) so the DELETE query always runs within a
transaction context.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql`:
- Around line 14-15: The comment on ai_connection.base_url says "Must be HTTPS"
but the ai_connection_base_url_check constraint doesn't enforce HTTPS; either
update the constraint to require HTTPS for OPENAI_COMPATIBLE (modify
ai_connection_base_url_check to include a condition like provider !=
'OPENAI_COMPATIBLE' OR base_url ILIKE 'https://%') or remove the "Must be HTTPS"
phrase from the COMMENT ON COLUMN ai_connection.base_url so the DB doc matches
enforcement; locate ai_connection_base_url_check and the COMMENT ON COLUMN
ai_connection.base_url in the migration and apply one of these two changes so
the comment and constraint are consistent.

In
`@java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql`:
- Line 16: The column documented_pr_numbers is defined as VARCHAR(4096), which
limits stored comma-separated PR IDs and may overflow for very long-lived tasks;
update the migration to use TEXT (or add a schema comment documenting the
~400–800 PR limit) and adjust the JPA/entity handling (where the field mapped to
documented_pr_numbers) to perform truncation/rotation or validation when
persisting so large PR lists are handled safely; alternatively add a migration
comment and unit tests that assert behavior when the PR list approaches/exceeds
the VARCHAR limit.
- Around line 28-29: The explicit CREATE INDEX for idx_qa_doc_state_task should
be removed because the unique constraint uq_qa_doc_state_project_task on table
qa_doc_state already creates a B-tree index on (project_id, task_id); delete the
statement "CREATE INDEX IF NOT EXISTS idx_qa_doc_state_task ON
qa_doc_state(project_id, task_id);" from the migration, or if you intended a
different index, replace it with the correct column set rather than duplicating
uq_qa_doc_state_project_task.

In
`@java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java`:
- Around line 73-86: The AIConnectionDTO.fromAiConnection conversion tests are
missing verification of the new baseUrl mapping; update the
shouldConvertWithAllFields test to set the baseUrl on the source AIConnection
(e.g., connection.setBaseUrl("https://api.example")) and assert that
AIConnectionDTO.fromAiConnection(connection).baseUrl() equals the same value,
ensuring AIConnection, AIConnectionDTO.fromAiConnection and
AIConnectionDTO.baseUrl() are exercised.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java`:
- Around line 33-34: QaDocGenerationContext currently accepts the caller's
mutable List for the changedFilePaths field (and other list fields built around
lines 96-115), violating the "immutable context" contract; update the record
construction (or the QaDocGenerationContext.Builder) to defensively copy
incoming Lists using List.copyOf(...) before assigning them to the record fields
so the built QaDocGenerationContext (and fields like changedFilePaths and the
other list members referenced in the builder) cannot be mutated by callers after
construction.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java`:
- Around line 157-162: Replace the brittle substring check with a dedicated
sentinel: add a private static final String GENERATION_FAILED_SENTINEL =
"[CODECROW_GENERATION_FAILED]" to QaDocGenerationService, and change the guard
to reject documentation when documentation == null || documentation.isBlank() ||
documentation.trim().equals(GENERATION_FAILED_SENTINEL) (use trim+equals to
match exact sentinel); also ensure the Python orchestrator emits the exact same
"[CODECROW_GENERATION_FAILED]" sentinel when generation fails so both sides
agree.

In
`@java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java`:
- Around line 548-560: The test should actually cover both 401 and 403 cases:
change shouldReturnErrorWhenJiraAuthFails to be parameterized over the status
codes (e.g., use `@ParameterizedTest` with `@ValueSource` ints {401,403}) or split
into two tests (e.g., shouldReturnErrorWhenJira401AuthFails and
shouldReturnErrorWhenJira403AuthFails); then in the body replace the hardcoded
TaskManagementException( "Authentication failed", 401, "" ) with
TaskManagementException("Authentication failed", status, "") using the
parameter, keep the same assertions on processor.process(payload, project,
eventConsumer, Map.of()), and verifyNoInteractions(qaDocGenerationService) so
both 401 and 403 paths are validated.

In `@python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py`:
- Around line 69-72: The three credential fields oauth_key, oauth_secret, and
bearer_token are currently plain strings and should be typed as
pydantic.SecretStr to enable automatic redaction; change their annotations to
SecretStr (import SecretStr from pydantic), update all call sites that pass
these values to services to safely unwrap with .get_secret_value() (and handle
None if Optional), and adjust any logging to avoid printing the raw secret
objects. Ensure imports and Optional typing remain correct and that services
receive the unwrapped string via .get_secret_value() only when not None.

In `@python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py`:
- Line 71: The for-loop unpacks infos into variables including an unused
variable named "family" which Ruff flags; rename it to a conventional unused
name (e.g., "_family" or "_" ) in the loop header that iterates over "infos"
(for _family, _type, _proto, _canonname, sockaddr in infos) so the linter stops
reporting an unused variable while leaving the rest of the unpacking and logic
(variables _type, _proto, _canonname, sockaddr) unchanged.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py`:
- Around line 79-95: The method index_pr_files currently only builds files from
enrichment_lookup and ignores the diff parameter, so implement the promised
diff-hunk fallback: when enrichment_data or enrichment_lookup lacks entries for
changed_file_paths, parse the provided diff into per-file hunks and populate the
local files collection (used later in index_pr_files) with lightweight file
objects derived from those hunks (e.g., path, hunk text, and language/metadata
as needed) so those files are indexed; ensure you use the existing
changed_file_paths and diff parameters to drive this fallback and keep existing
behavior when enrichment_data is present.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py`:
- Around line 46-73: The generate method on QaDocService has an excessive
parameter list; introduce a QaDocGenerationRequest dataclass (or pydantic model)
to encapsulate all inputs (project_id, project_name, pr_number, issues_found,
files_analyzed, pr_metadata, template_mode, custom_template, task_context,
ai_provider, ai_model, ai_api_key, ai_base_url, diff, delta_diff,
previous_documentation, is_same_pr_rerun, enrichment_data, changed_file_paths,
workspace_slug, repo_slug, source_branch, target_branch, vcs_provider,
output_language) and change the signature of QaDocService.generate(...) to
accept a single request: QaDocGenerationRequest (plus any rare optional args
like cancellation/context if needed); update all call sites (router and
orchestrator invocation inside generate) to construct and pass the new
dataclass, preserving field names and types so behavior remains unchanged.

In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`:
- Around line 22-31: Add a short inline comment above _AC_HEADER_PATTERNS
(and/or next to the fullwidth colon characters in the patterns) explaining that
the fullwidth colon ':' (U+FF1A) is intentionally included to support
CJK/Japanese/Chinese Jira descriptions and should not be removed; mention this
is to satisfy locale-specific punctuation in headers like "Acceptance Criteria".
Reference _AC_HEADER_PATTERNS and _AC_HEADER_REGEX so reviewers see why the
fullwidth character is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e4fd477-5310-49ef-b358-1b00a17f610a

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5df49 and 23fbe00.

📒 Files selected for processing (56)
  • README.md
  • frontend
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/aiclient/AiAnalysisClient.java
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClient.java
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequest.java
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequestImpl.java
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingService.java
  • java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/VcsDiffUtils.java
  • java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.java
  • java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientTest.java
  • java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/service/branch/BranchIssueMappingServiceTest.java
  • java-ecosystem/libs/ast-parser/src/main/resources/META-INF/MANIFEST.MF
  • java-ecosystem/libs/core/src/main/java/module-info.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTO.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/dto/taskmanagement/QaAutoDocConfigRequest.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIConnection.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/ai/AIProviderKey.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/QaAutoDocConfig.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/codeanalysis/CodeAnalysisIssueRepository.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/qadoc/QaDocStateRepository.java
  • java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/util/SsrfSafeUrlValidator.java
  • java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.5.0__add_openai_compatible_provider.sql
  • java-ecosystem/libs/core/src/main/resources/db/migration/managed/V2.6.0__qa_doc_state.sql
  • java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/dto/ai/AIConnectionDTOTest.java
  • java-ecosystem/libs/core/src/test/java/org/rostilos/codecrow/core/model/ai/AIProviderKeyTest.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/service/BitbucketAiClientService.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/AskCommandProcessor.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/SummarizeCommandProcessor.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubAiClientService.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/service/GitLabAiClientService.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationContext.java
  • java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationService.java
  • java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessorTest.java
  • java-ecosystem/services/pipeline-agent/src/test/java/org/rostilos/codecrow/pipelineagent/qadoc/QaDocGenerationServiceTest.java
  • java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/controller/LlmModelController.java
  • java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/dto/request/CreateAIConnectionRequest.java
  • java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/dto/request/UpdateAiConnectionRequest.java
  • java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/ai/service/AIConnectionService.java
  • java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/taskmanagement/service/TaskManagementService.java
  • python-ecosystem/inference-orchestrator/src/api/routers/qa_documentation.py
  • python-ecosystem/inference-orchestrator/src/llm/llm_factory.py
  • python-ecosystem/inference-orchestrator/src/llm/ssrf_safe_transport.py
  • python-ecosystem/inference-orchestrator/src/model/dtos.py
  • python-ecosystem/inference-orchestrator/src/service/command/command_service.py
  • python-ecosystem/inference-orchestrator/src/service/qa_documentation/base_orchestrator.py
  • python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py
  • python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_service.py
  • python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py
  • python-ecosystem/inference-orchestrator/src/service/review/review_service.py
  • python-ecosystem/inference-orchestrator/src/utils/dependency_graph.py
  • python-ecosystem/inference-orchestrator/src/utils/prompts/constants_qa_doc.py
  • python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py

AiCommandClient.SummarizeRequest request = new AiCommandClient.SummarizeRequest(
1L, "workspace", "repo-slug", "project-workspace", "namespace",
"openai", "gpt-4", "api-key", 42L, "feature", "main", "abc123",
"openai", "gpt-4", "api-key", null, 42L, "feature", "main", "abc123",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit assertions for the new aiBaseUrl record field.
These tests updated constructor arguments but never verify request.aiBaseUrl(), so regressions on this new component can slip through.

Suggested assertion additions
@@
             assertThat(request.aiProvider()).isEqualTo("openai");
             assertThat(request.aiModel()).isEqualTo("gpt-4");
             assertThat(request.aiApiKey()).isEqualTo("api-key");
+            assertThat(request.aiBaseUrl()).isNull();
             assertThat(request.pullRequestId()).isEqualTo(42L);
@@
             assertThat(request.projectId()).isEqualTo(1L);
             assertThat(request.aiProvider()).isEqualTo("anthropic");
             assertThat(request.aiModel()).isEqualTo("claude-3");
+            assertThat(request.aiBaseUrl()).isNull();
             assertThat(request.question()).isEqualTo("What is this code doing?");
@@
             assertThat(request.projectId()).isEqualTo(1L);
+            assertThat(request.aiBaseUrl()).isNull();
             assertThat(request.pullRequestId()).isEqualTo(42L);

Also applies to: 57-57, 77-77, 149-149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/aiclient/AiCommandClientRecordsTest.java`
at line 23, Tests in AiCommandClientRecordsTest updated constructor args to
include the new aiBaseUrl field but never assert it; add explicit assertions
calling request.aiBaseUrl() in each affected test (the tests that construct the
record and name the variable request) to verify the expected value (e.g., null
or "..." as appropriate for each case), alongside existing assertions so
regressions for the new aiBaseUrl property are caught; locate the record
instantiations in methods within AiCommandClientRecordsTest and add a matching
assertEquals(expected, request.aiBaseUrl()) for each scenario.

Comment on lines +98 to +107
public Set<Long> getDocumentedPrNumbersSet() {
if (documentedPrNumbers == null || documentedPrNumbers.isBlank()) {
return Collections.emptySet();
}
return Arrays.stream(documentedPrNumbers.split(","))
.map(String::trim)
.filter(s -> !s.isEmpty())
.map(Long::parseLong)
.collect(Collectors.toUnmodifiableSet());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Uncaught NumberFormatException if PR numbers data is corrupted.

If documentedPrNumbers contains non-numeric values (e.g., from manual DB edits or bugs), Long.parseLong() will throw NumberFormatException, potentially breaking the QA doc flow.

🛡️ Proposed defensive parsing
     public Set<Long> getDocumentedPrNumbersSet() {
         if (documentedPrNumbers == null || documentedPrNumbers.isBlank()) {
             return Collections.emptySet();
         }
         return Arrays.stream(documentedPrNumbers.split(","))
                 .map(String::trim)
                 .filter(s -> !s.isEmpty())
-                .map(Long::parseLong)
+                .flatMap(s -> {
+                    try {
+                        return java.util.stream.Stream.of(Long.parseLong(s));
+                    } catch (NumberFormatException e) {
+                        // Skip corrupted entries rather than failing
+                        return java.util.stream.Stream.empty();
+                    }
+                })
                 .collect(Collectors.toUnmodifiableSet());
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/qadoc/QaDocState.java`
around lines 98 - 107, getDocumentedPrNumbersSet currently calls Long.parseLong
on every token and will throw NumberFormatException for any
corrupted/non-numeric tokens in documentedPrNumbers; update
getDocumentedPrNumbersSet to defensively parse tokens by trimming and attempting
to parse each token inside a try/catch (or by pre-validating with a numeric
regex), skip tokens that fail to parse instead of letting the exception
propagate, collect the successfully parsed Longs into an unmodifiable Set, and
if a logger (e.g., LOGGER) is available emit a warning with the invalid token(s)
to aid debugging; ensure the method still returns Collections.emptySet() when
documentedPrNumbers is blank or no valid numbers are found.

Comment on lines +293 to +298
// 6. Load server-side state and check for existing Jira comment
QaDocState state = (prNumber != null)
? qaDocStateRepository.findByProjectIdAndTaskId(project.getId(), taskId).orElse(null)
: null;
boolean isSamePrRerun = (state != null && prNumber != null && state.isPrDocumented(prNumber));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

fd -i 'QaDocState' --extension java
echo
rg -n -C3 'class QaDocState|lastCommitHash|documentedPrNumbers|recordGeneration|getLastCommitHash|findByProjectIdAndTaskId' --type java

Repository: rostilos/CodeCrow

Length of output: 22173


Store commit bases per PR within QaDocState, not just lastCommitHash.

The state is currently loaded by (projectId, taskId) and only holds one lastCommitHash for the entire task. When multiple PRs document the same task and one reruns, it will use another PR's last commit hash as the delta base, producing incorrect diffs. Replace the single lastCommitHash field with a mapping that tracks commit bases per PR number (e.g., Map<Long, String> prCommitHashes), and update recordGeneration(String commitHash, Long analysisId, Long prNumber) to store the hash keyed by PR, then retrieve it by current PR on rerun.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`
around lines 293 - 298, The QaDocState currently stores only lastCommitHash per
(projectId, taskId) which breaks multi-PR runs; change QaDocState to replace
lastCommitHash with a Map<Long,String> prCommitHashes keyed by PR number, update
recordGeneration(String commitHash, Long analysisId, Long prNumber) to store
prCommitHashes.put(prNumber, commitHash), and when loading state via
qaDocStateRepository.findByProjectIdAndTaskId(...) use
prCommitHashes.get(prNumber) to determine the delta base and to evaluate
isSamePrRerun for the current PR instead of the single lastCommitHash; also
adjust any helper methods (e.g., isPrDocumented) to consult the new map.

Comment on lines +322 to 336
// 6a. Compute delta diff for same-PR re-runs
String deltaDiff = null;
if (isSamePrRerun && state.getLastCommitHash() != null
&& commitHash != null && opsService != null && httpClient != null) {
DiffContentFilter contentFilter = new DiffContentFilter();
final OkHttpClient cl = httpClient;
final VcsOperationsService ops = opsService;
deltaDiff = VcsDiffUtils.fetchDeltaDiff(
(ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head),
workspace, repoSlug,
state.getLastCommitHash(), commitHash, contentFilter);
if (deltaDiff != null) {
log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run",
deltaDiff.length());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fall back cleanly when the delta diff cannot be computed.

This is the only VCS fetch in the rerun path that is not locally guarded. If getCommitRangeDiff(...) fails, the outer catch aborts the whole command even though the generator can still continue with deltaDiff = null and the full diff.

Suggested fallback
             if (isSamePrRerun && state.getLastCommitHash() != null
                     && commitHash != null && opsService != null && httpClient != null) {
-                DiffContentFilter contentFilter = new DiffContentFilter();
-                final OkHttpClient cl = httpClient;
-                final VcsOperationsService ops = opsService;
-                deltaDiff = VcsDiffUtils.fetchDeltaDiff(
-                        (ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head),
-                        workspace, repoSlug,
-                        state.getLastCommitHash(), commitHash, contentFilter);
-                if (deltaDiff != null) {
-                    log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run",
-                            deltaDiff.length());
-                }
+                try {
+                    DiffContentFilter contentFilter = new DiffContentFilter();
+                    final OkHttpClient cl = httpClient;
+                    final VcsOperationsService ops = opsService;
+                    deltaDiff = VcsDiffUtils.fetchDeltaDiff(
+                            (ws, repo, base, head) -> ops.getCommitRangeDiff(cl, ws, repo, base, head),
+                            workspace, repoSlug,
+                            state.getLastCommitHash(), commitHash, contentFilter);
+                    if (deltaDiff != null) {
+                        log.info("qa-doc command: computed delta diff ({} chars) for same-PR re-run",
+                                deltaDiff.length());
+                    }
+                } catch (Exception e) {
+                    log.warn("qa-doc command: failed to compute delta diff, falling back to full diff: {}",
+                            e.getMessage());
+                    deltaDiff = null;
+                }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/QaDocCommandProcessor.java`
around lines 322 - 336, The delta diff fetch in QaDocCommandProcessor (using
VcsDiffUtils.fetchDeltaDiff which delegates to ops.getCommitRangeDiff) must be
guarded so failures don't abort the whole command; wrap the fetch call (the
block that computes deltaDiff using opsService/httpClient and
VcsDiffUtils.fetchDeltaDiff) in a try/catch that on any exception sets deltaDiff
= null, logs a warning including the exception and context (repoSlug, commit
hashes), and allows processing to continue so the generator falls back to the
full diff; ensure no exception from getCommitRangeDiff or fetchDeltaDiff bubbles
out.

Comment on lines +355 to +374
// 9. Update server-side state (secure, tamper-proof)
Long analysisId = (analysis != null) ? analysis.getId() : null;
upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state);

// 10. Post or update Jira comment
String commentBody = COMMENT_MARKER + "\n\n" + qaDocument;

// 8. Post or update comment
if (existingComment.isPresent()) {
client.updateComment(taskId, existingComment.get().commentId(), commentBody);
log.info("QA auto-doc: updated existing comment on task {} (commentId={})",
taskId, existingComment.get().commentId());
} else {
client.postComment(taskId, commentBody);
log.info("QA auto-doc: posted new comment on task {}", taskId);
try {
if (existingComment.isPresent()) {
tmClient.updateComment(taskId, existingComment.get().commentId(), commentBody);
log.info("QA auto-doc: updated existing comment on task {} (commentId={})",
taskId, existingComment.get().commentId());
} else {
tmClient.postComment(taskId, commentBody);
log.info("QA auto-doc: posted new comment on task {}", taskId);
}
} catch (Exception e) {
log.error("QA auto-doc: failed to post/update comment on task {}: {}",
taskId, e.getMessage(), e);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

State is persisted before Jira comment is posted — may cause skipped PRs on retry.

If upsertQaDocState (line 357) succeeds but the subsequent Jira comment posting (lines 362-370) fails, the state records the PR as documented even though no comment was posted. On the next analysis event, isSamePrRerun will be true, potentially causing the system to skip generating a full document.

Consider persisting state only after successful Jira posting, or implementing a two-phase approach.

🔧 Move state persistence after successful posting
-        // 9. Update server-side state (secure, tamper-proof)
-        Long analysisId = (analysis != null) ? analysis.getId() : null;
-        upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state);
-
         // 10. Post or update Jira comment
         String commentBody = COMMENT_MARKER + "\n\n" + qaDocument;

         try {
             if (existingComment.isPresent()) {
                 tmClient.updateComment(taskId, existingComment.get().commentId(), commentBody);
                 log.info("QA auto-doc: updated existing comment on task {} (commentId={})",
                         taskId, existingComment.get().commentId());
             } else {
                 tmClient.postComment(taskId, commentBody);
                 log.info("QA auto-doc: posted new comment on task {}", taskId);
             }
+            
+            // Persist state only after successful Jira posting
+            Long analysisId = (analysis != null) ? analysis.getId() : null;
+            upsertQaDocState(project, taskId, currentCommitHash, analysisId, prNumber, state);
         } catch (Exception e) {
             log.error("QA auto-doc: failed to post/update comment on task {}: {}",
                     taskId, e.getMessage(), e);
+            // State not persisted — next run will retry with full generation
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/qadoc/QaAutoDocListener.java`
around lines 355 - 374, The state is being upserted before the Jira comment is
guaranteed posted, which can mark a PR as documented even if posting fails; move
the call to upsertQaDocState so it executes only after tmClient.postComment or
tmClient.updateComment succeeds (i.e., after the try block that posts/updates
the comment), or implement a two-phase commit where you mark state only on
successful comment response and on failure do not set the documented flag so
isSamePrRerun sees it as not completed; update references around
upsertQaDocState, tmClient.postComment, tmClient.updateComment and isSamePrRerun
accordingly.

Comment on lines +202 to +213
# For same-PR re-runs, analyze the delta diff — but only if
# it contains actual hunks (@@). A delta that is truthy but
# header-only (no @@) would starve the whole pipeline of context.
has_real_delta = (
is_same_pr_rerun
and delta_diff
and len(delta_diff.strip()) > 100
and '@@' in delta_diff
)
analysis_diff = delta_diff if has_real_delta else diff
if is_same_pr_rerun and not has_real_delta:
logger.info(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the same delta-quality gate when choosing Stage 3.

When delta_diff is header-only, Stage 1/2 already fall back to the full diff via has_real_delta, but Stage 3 still switches to the delta prompt on any non-empty delta_diff. That mixes full-diff analysis with a delta-only aggregation step and can mark unchanged scenarios as “updated in latest push.”

Suggested fix
-            if is_same_pr_rerun and delta_diff and previous_documentation:
+            if has_real_delta and previous_documentation:
                 documentation = await self._execute_stage_3_delta(

Also applies to: 258-268

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`
around lines 202 - 213, The Stage 3 aggregation choice must use the same
delta-quality gate as earlier: replace the current logic that picks delta_diff
whenever delta_diff is non-empty with the same has_real_delta check used for
analysis_diff (use has_real_delta && '@@' check outcome). Concretely, where
Stage 3 currently selects the prompt/input (references to delta_diff, diff, and
the Stage 3 prompt/aggregation block), change it to use analysis_diff or to pick
delta_diff only when has_real_delta is true (otherwise fall back to diff) so
Stage 3 won't run a delta-only aggregation for header-only deltas; keep the
existing has_real_delta variable and reuse it.

Comment on lines +633 to +648
BUDGET_NORMAL = 200_000
BUDGET_TIGHT = 100_000
try:
return await _attempt(BUDGET_NORMAL)
except Exception as first_err:
logger.warning(
"Stage 3 delta failed at %dK budget: %s — retrying at %dK",
BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000,
)
return await _attempt(BUDGET_TIGHT)

response = await self.llm.ainvoke([
{"role": "system", "content": QA_DOC_SYSTEM_PROMPT},
{"role": "user", "content": prompt},
])
return self._extract_text(response)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd python-ecosystem/inference-orchestrator && find . -name "qa_doc_orchestrator.py" -type f

Repository: rostilos/CodeCrow

Length of output: 114


🏁 Script executed:

cd python-ecosystem/inference-orchestrator && wc -l ./src/service/qa_documentation/qa_doc_orchestrator.py

Repository: rostilos/CodeCrow

Length of output: 118


🏁 Script executed:

cd python-ecosystem/inference-orchestrator && sed -n '620,660p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 1848


🏁 Script executed:

cd python-ecosystem/inference-orchestrator && sed -n '600,633p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 1708


🏁 Script executed:

cd python-ecosystem/inference-orchestrator && sed -n '570,610p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 1905


🏁 Script executed:

cd python-ecosystem/inference-orchestrator && sed -n '587,655p' ./src/service/qa_documentation/qa_doc_orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 3243


Remove the dead code block at the end of _execute_stage_3_delta.

The code at lines 644-648 is unreachable since both the try and except branches return. Additionally, the prompt variable referenced in this block is undefined at that scope—it only exists within the nested _attempt function. This causes an F821 lint error that must be resolved by deleting the dead code.

Suggested cleanup
         try:
             return await _attempt(BUDGET_NORMAL)
         except Exception as first_err:
             logger.warning(
                 "Stage 3 delta failed at %dK budget: %s — retrying at %dK",
                 BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000,
             )
             return await _attempt(BUDGET_TIGHT)
-
-        response = await self.llm.ainvoke([
-            {"role": "system", "content": QA_DOC_SYSTEM_PROMPT},
-            {"role": "user", "content": prompt},
-        ])
-        return self._extract_text(response)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
BUDGET_NORMAL = 200_000
BUDGET_TIGHT = 100_000
try:
return await _attempt(BUDGET_NORMAL)
except Exception as first_err:
logger.warning(
"Stage 3 delta failed at %dK budget: %s — retrying at %dK",
BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000,
)
return await _attempt(BUDGET_TIGHT)
response = await self.llm.ainvoke([
{"role": "system", "content": QA_DOC_SYSTEM_PROMPT},
{"role": "user", "content": prompt},
])
return self._extract_text(response)
BUDGET_NORMAL = 200_000
BUDGET_TIGHT = 100_000
try:
return await _attempt(BUDGET_NORMAL)
except Exception as first_err:
logger.warning(
"Stage 3 delta failed at %dK budget: %s — retrying at %dK",
BUDGET_NORMAL // 1000, first_err, BUDGET_TIGHT // 1000,
)
return await _attempt(BUDGET_TIGHT)
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 637-637: Do not catch blind exception: Exception

(BLE001)


[error] 646-646: Undefined name prompt

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/qa_documentation/qa_doc_orchestrator.py`
around lines 633 - 648, In _execute_stage_3_delta remove the unreachable dead
code at the end that calls self.llm.ainvoke and returns
self._extract_text(response); both the try and except paths already return and
the local prompt variable referenced there is undefined in that scope (causing
F821). Delete the response = await self.llm.ainvoke([...]) and the subsequent
return self._extract_text(response) lines so the function ends after the
existing try/except/_attempt logic and no undefined symbols remain.

Comment on lines +455 to +467
SEVERITY_RANK = {"HIGH": 3, "MEDIUM": 2, "LOW": 1, "INFO": 0}

def _sort_key(issue: Dict[str, Any]):
sev = SEVERITY_RANK.get((issue.get("severity") or "").upper(), 0)
version = issue.get("prVersion") or 0
return (sev, version)

# Sort highest-priority first so we keep the best representative
sorted_issues = sorted(issues, key=_sort_key, reverse=True)

# Tier 1: Location fingerprint (file + lineHash + category)
seen_locations: Set[str] = set()
tier1_result: List[Dict[str, Any]] = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and get basic info
fd orchestrator.py | head -5

Repository: rostilos/CodeCrow

Length of output: 330


🏁 Script executed:

# Read the imports at the top of the file to check current typing imports
head -50 python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 1945


🏁 Script executed:

# Read lines 455-467 to verify the code matches and check for Set usage
sed -n '450,470p' python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | cat -n

Repository: rostilos/CodeCrow

Length of output: 966


🏁 Script executed:

# Check if Set is imported anywhere in the file
grep -n "from typing import\|import Set" python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py | head -20

Repository: rostilos/CodeCrow

Length of output: 118


Add Set to the typing imports.

The code at line 466 uses seen_locations: Set[str] = set(), but Set is not imported. The current import statement (line 11) only includes Dict, Any, List, Optional, Callable. This will cause a NameError at runtime.

🔧 Proposed fix
-from typing import Dict, Any, List, Optional, Callable
+from typing import Dict, Any, List, Optional, Callable, Set
🧰 Tools
🪛 Ruff (0.15.7)

[error] 466-466: Undefined name Set

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`
around lines 455 - 467, The code uses the typing symbol Set (seen_locations:
Set[str] = set()) but Set is not imported; update the module's typing import
list to include Set (alongside Dict, Any, List, Optional, Callable) so
references like seen_locations and any other Set-typed annotations resolve
correctly; ensure you only modify the import line(s) at the top of
orchestrator.py to add Set.

Comment on lines +482 to +504
# Tier 2: Semantic similarity within same file (title-based)
from collections import OrderedDict
by_file: OrderedDict[str, List[Dict[str, Any]]] = OrderedDict()
for issue in tier1_result:
fp = issue.get("file") or issue.get("filePath") or "_unknown_"
by_file.setdefault(fp, []).append(issue)

final: List[Dict[str, Any]] = []
for file_path, file_issues in by_file.items():
kept: List[Dict[str, Any]] = []
for issue in file_issues:
title = (issue.get("title") or issue.get("reason") or "").lower().strip()
is_dup = False
for existing in kept:
existing_title = (existing.get("title") or existing.get("reason") or "").lower().strip()
if title and existing_title:
ratio = difflib.SequenceMatcher(None, title, existing_title).ratio()
if ratio >= 0.75:
is_dup = True
break
if not is_dup:
kept.append(issue)
final.extend(kept)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The semantic-dedup pass can drop distinct findings from the same file.

Lines 490-503 treat any same-file issues with a title/reason similarity of >= 0.75 as duplicates, even when they point to different locations. That will collapse common findings like two separate null-dereference warnings in one file, and the dropped issue never reaches reconciliation. Please gate tier 2 on extra location evidence before removing it.

🧰 Tools
🪛 Ruff (0.15.7)

[warning] 490-490: Loop control variable file_path not used within loop body

(B007)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@python-ecosystem/inference-orchestrator/src/service/review/orchestrator/orchestrator.py`
around lines 482 - 504, The current semantic dedupe in the block processing
tier1_result (using by_file, kept, title, existing_title and
difflib.SequenceMatcher) removes issues with similar titles even when they point
to different locations; update the dedupe logic so that before marking is_dup
true you verify location evidence: require matching or overlapping location
fields (e.g., compare issue.get("startLine")/issue.get("endLine") or
issue.get("line")/issue.get("span")/issue.get("filePath") if present) and only
treat as duplicates when both title similarity >= 0.75 AND the location is
identical or overlaps (or when location is missing for both); otherwise keep
both issues so distinct findings in the same file are not dropped.

Comment on lines +134 to +145
if description:
desc_to_show = description
if ac_text:
# Remove the AC block from the description to avoid duplication
ac_start = description.lower().find("acceptance criteria")
if ac_start == -1:
ac_start = description.lower().find("definition of done")
if ac_start > 0:
desc_to_show = description[:ac_start].strip()
elif ac_start == 0:
# AC was at the very beginning — show everything after it
desc_to_show = ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

AC removal logic may not match all patterns that extraction handles.

The extraction uses _AC_HEADER_REGEX which matches various formats (h1-h6 headers, bold markers, etc.), but the removal logic uses a simple find("acceptance criteria") which won't correctly locate patterns like h3. Acceptance Criteria or **Acceptance Criteria**.

This can cause the AC block to appear twice — once in the dedicated AC section and again in the description.

🔧 Proposed fix: reuse the regex match position
     # Full description (truncated, without the AC block to avoid duplication)
     if description:
         desc_to_show = description
         if ac_text:
             # Remove the AC block from the description to avoid duplication
-            ac_start = description.lower().find("acceptance criteria")
-            if ac_start == -1:
-                ac_start = description.lower().find("definition of done")
-            if ac_start > 0:
-                desc_to_show = description[:ac_start].strip()
-            elif ac_start == 0:
-                # AC was at the very beginning — show everything after it
-                desc_to_show = ""
+            # Use the same regex for consistent matching
+            ac_match = _AC_HEADER_REGEX.search(description)
+            if ac_match:
+                ac_start = ac_match.start()
+                if ac_start > 0:
+                    desc_to_show = description[:ac_start].strip()
+                else:
+                    # AC was at the very beginning — show everything after it
+                    desc_to_show = ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python-ecosystem/inference-orchestrator/src/utils/task_context_builder.py`
around lines 134 - 145, The current AC-removal uses a plain substring search
which misses headers matched by _AC_HEADER_REGEX; update the logic in the block
that computes desc_to_show (the code that checks description, ac_text and
currently uses description.lower().find(...)) to run _AC_HEADER_REGEX on
description (preserving case/flags used by the extractor), and if a match is
found use match.start() to slice out the AC section (or match.end() to drop a
leading header), otherwise fall back to the existing substring checks; ensure
you reference description, ac_text, desc_to_show and _AC_HEADER_REGEX so the
removal reuses the same regex positions the extractor uses and prevents
duplicate AC content.

# Conflicts:
#	README.md
@rostilos rostilos merged commit 0db1928 into main Mar 31, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant