Skip to content

fix(content): replace unbounded query loop with LIMIT-1 DB call in validateRelationships() (#35489)#35491

Open
fabrizzio-dotCMS wants to merge 5 commits intomainfrom
fix/issue-35489-validate-relationships-limit
Open

fix(content): replace unbounded query loop with LIMIT-1 DB call in validateRelationships() (#35489)#35491
fabrizzio-dotCMS wants to merge 5 commits intomainfrom
fix/issue-35489-validate-relationships-limit

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

Problem

Re-saving a contentlet with required many-to-many relationships took up to 10 minutes on environments with ~500 related records and ~40 configured languages. Profiling showed 99% of CPU time in ESContentletAPIImpl.validateRelationships().

Two hotspots caused an N×M SQL query explosion (500 identifiers × 40 languages = 20 000+ queries per save):

  1. Required-relationship existence check (the !foundInRelationships path) called getRelatedContent(), which routes through RelationshipCache and fires one contentlet_version_info SELECT per identifier per language — just to answer a boolean "does anything exist?".

  2. ONE_TO_MANY cardinality check inside the per-child loop also called getRelatedContent() unconditionally for every cardinality, including MANY_TO_MANY, triggering the same cache-miss explosion even when the result was never needed.

Fix

Both calls are replaced with FactoryLocator.getRelationshipFactory().dbRelatedContent(relationship, contentlet, hasParent, false, null, 1, 0), which issues a single LIMIT 1 query directly against tree + contentlet_version_info and stops at the first matching row. The cardinality check is also guarded behind if (ONE_TO_MANY) so it is skipped entirely for other cardinalities.

This mirrors the approach from PR #35407 (which fixed a similar bottleneck in RelationshipFactoryImpl for an older branch) and adapts it to the refactored validateRelationships() in main.

Changes

  • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java
    • Replaced getRelatedContent() in required-relationship existence check with dbRelatedContent(…, 1, 0)
    • Wrapped ONE_TO_MANY cardinality DB check in if (ONE_TO_MANY) guard and replaced getRelatedContent() with dbRelatedContent(…, 1, 0)

Acceptance Criteria

  • Re-saving a contentlet with required relationships completes in under 10 seconds (baseline: ~10 min)
  • Total JDBC executions during save are in the hundreds, not tens of thousands
  • Required-relationship validation still rejects saves where a required relationship has no children in the DB
  • Required-relationship validation still passes when the relationship is omitted from the payload but children exist in the DB
  • ONE_TO_ONE cardinality validation still rejects saves that link more than one contentlet
  • Content types with no required relationships are unaffected

Test

./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false \
  -Dit.test=ESContentletAPIImplTest#testValidateRelationships

References

🤖 Generated with Claude Code

…uery in validateRelationships() (#35489)

Two hotspots in ESContentletAPIImpl.validateRelationships() caused N×M query explosion
(500 related identifiers × 40 languages = 20 000+ SQL queries per save):

1. The required-relationship existence check called getRelatedContent(), which routes
   through RelationshipCache and fires one query per identifier per language.
   Replaced with FactoryLocator.getRelationshipFactory().dbRelatedContent(…, 1, 0)
   — a single LIMIT-1 query on tree + contentlet_version_info that stops at first hit.

2. The ONE_TO_MANY cardinality check inside the per-child loop called
   getRelatedContent() unconditionally for every cardinality (including MANY_TO_MANY),
   triggering the same cache-miss explosion. Guarded the DB call behind an
   if (ONE_TO_MANY) check and replaced it with the same dbRelatedContent(…, 1, 0) call.

Refs: #35489

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 28, 2026

Claude finished @fabrizzio-dotCMS's task in 2m 23s —— View job


PR Review

  • Read diff and surrounding code context
  • Analyze logic correctness
  • Check edge cases and risk areas
  • Post review findings

Overall: The performance fix is sound and the logic is preserved correctly. A few things worth a second look:


1. live=false is a silent semantic change

Both new dbRelatedContent calls pass live=false (working inodes). The original calls:

  • Existence check (!foundInRelationships path, line ~8633): Called getRelatedContent(contentlet, rel, checkParent, systemUser(), false) — the false is respectFrontendRoles, not live. With system user and no live override, the chain resolves to live=null, which in turn falls through to getNonCachedRelatedContentlets and eventually hits dbRelatedContent with live=false anyway. So this path is equivalent — no behavioral change.

  • ONE_TO_MANY parent check (line ~8706): Called getRelatedContent(contentInRelationship, relationship, false, systemUser(), true, 1, 0, null) — again live not specified, resolves to null → working. Also equivalent.

So live=false is correct here, but this should be explicit in the PR description rather than left to archaeology.


2. Missing break after addBadCardinalityRelationship (pre-existing, now more visible)

After the ONE_TO_MANY error is detected (line ~8713–8716):

hasError = true;
builder.addBadCardinalityRelationship(relationship, contentsInRelationship);

The loop continues to the next contentInRelationship. If 3 children all have different parents, addBadCardinalityRelationship(relationship, contentsInRelationship) is called 3 times with the full contentsInRelationship list each time. This is a pre-existing bug (same in the old code), but the refactoring is a good opportunity to fix it. Suggest adding a break after setting hasError = true.

Fix this →


3. null orderBy in the ONE_TO_MANY call

dbRelatedContent(relationship, contentInRelationship, false, false, null, 1, 0) — the null orderBy traces through SQLUtil.sanitizeSortBy(null)nullincludeContentlet = false → SQL falls back to order by t.tree_order, cvi.version_ts. Since only .isEmpty() and .get(0) are used, ordering doesn't affect correctness. Fine.


4. IdentifierAPI.find(contentlet) still called per child

dbRelatedContent does APILocator.getIdentifierAPI().find(contentlet) on line 360 of RelationshipFactoryImpl for every call. For the ONE_TO_MANY loop, this means one identifier lookup per child contentlet — still O(N), but vastly better than the original O(N×M) cache-miss storm. Not introduced by this PR, acceptable tradeoff.


5. DotSecurityException removal is correct

The original catch (DotDataException | DotSecurityException e) correctly drops to catch (DotDataException e) since dbRelatedContent only declares throws DotDataException. No concern here.


Summary: No correctness bugs found. Item #2 (duplicate addBadCardinalityRelationship calls) is a pre-existing defect worth fixing while this code is open. Everything else is clean.

fabrizzio-dotCMS and others added 4 commits April 28, 2026 19:32
…tent migration

- Remove DotSecurityException from catch — dbRelatedContent() uses systemUser
  internally and never throws it
- Replace size() > 0 with isEmpty() for idiomatic Java

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Slow contentlet save with required relationships — unbounded query loop in validateRelationships()

1 participant