Skip to content

refactor(amber): remove redundant result-readiness wait in SyncExecutionResource#5856

Draft
bobbai00 wants to merge 1 commit into
apache:mainfrom
bobbai00:perf/sync-execution-trust-completed
Draft

refactor(amber): remove redundant result-readiness wait in SyncExecutionResource#5856
bobbai00 wants to merge 1 commit into
apache:mainfrom
bobbai00:perf/sync-execution-trust-completed

Conversation

@bobbai00

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Removes the two Thread.sleep(500) waits in SyncExecutionResource.executeWorkflowSync and reads results as soon as the execution reaches COMPLETED. The wait was redundant: the engine commits each operator's result storage synchronously — OutputManager.closeOutputStorageWriterIfNeeded joins the writer thread (forcing the Iceberg commit) before the worker reports COMPLETED, and a failed commit surfaces as FAILED/KILLED, never COMPLETED. Adds ResultPersistedOnCompletionSpec to lock that invariant.

Any related issues, documentation, discussions?

Closes #5855

How was this PR tested?

WorkflowExecutionService/testOnly ...e2e.ResultPersistedOnCompletionSpec — 2 cases (a single source, and a multi-region scan→keyword→count DAG) that read result storage the instant the run reports COMPLETED (no wait) and assert the committed getCount equals the rows actually readable. The existing OutputPortStorageWriterThreadSpec covers the commit/failure-propagation path.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context)

…ionResource

Results are durably committed before COMPLETED is observable
(OutputManager.closeOutputStorageWriterIfNeeded joins the result-writer
thread, forcing the Iceberg commit, before the worker reports the port
complete), and a failed commit surfaces as FAILED/KILLED. Both
result-reading paths are gated on COMPLETED, so the two Thread.sleep(500)
waits were redundant. Add an e2e spec to lock the invariant.
@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Ma77Ball
    You can notify them by mentioning @Ma77Ball in a comment.

@github-actions

Copy link
Copy Markdown
Contributor

✅ No material benchmark regressions detected

🟢 0 better · 🔴 0 worse · ⚪ 15 noise (<±5%) · 0 without baseline

Compared against main b2886ca benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
bs=10 sw=10 sl=64 394 0.24 24,234/36,575/36,575 us ⚪ within ±5% / ⚪ within ±5%
bs=100 sw=10 sl=64 815 0.497 123,218/137,580/137,580 us ⚪ within ±5% / 🔴 +9.7%
bs=1000 sw=10 sl=64 925 0.565 1,082,257/1,111,552/1,111,552 us ⚪ within ±5% / 🔴 +11.3%
Baseline details

Latest main b2886ca from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 394 tuples/sec 392 tuples/sec 410.82 tuples/sec +0.5% -4.1%
bs=10 sw=10 sl=64 MB/s 0.24 MB/s 0.239 MB/s 0.251 MB/s +0.4% -4.3%
bs=10 sw=10 sl=64 p50 24,234 us 25,186 us 23,785 us -3.8% +1.9%
bs=10 sw=10 sl=64 p95 36,575 us 36,111 us 34,980 us +1.3% +4.6%
bs=10 sw=10 sl=64 p99 36,575 us 36,111 us 34,980 us +1.3% +4.6%
bs=100 sw=10 sl=64 throughput 815 tuples/sec 823 tuples/sec 891.94 tuples/sec -1.0% -8.6%
bs=100 sw=10 sl=64 MB/s 0.497 MB/s 0.502 MB/s 0.544 MB/s -1.0% -8.7%
bs=100 sw=10 sl=64 p50 123,218 us 120,415 us 112,277 us +2.3% +9.7%
bs=100 sw=10 sl=64 p95 137,580 us 143,366 us 139,802 us -4.0% -1.6%
bs=100 sw=10 sl=64 p99 137,580 us 143,366 us 139,802 us -4.0% -1.6%
bs=1000 sw=10 sl=64 throughput 925 tuples/sec 928 tuples/sec 1,041 tuples/sec -0.3% -11.1%
bs=1000 sw=10 sl=64 MB/s 0.565 MB/s 0.566 MB/s 0.635 MB/s -0.2% -11.1%
bs=1000 sw=10 sl=64 p50 1,082,257 us 1,082,111 us 972,714 us +0.0% +11.3%
bs=1000 sw=10 sl=64 p95 1,111,552 us 1,121,997 us 1,023,057 us -0.9% +8.7%
bs=1000 sw=10 sl=64 p99 1,111,552 us 1,121,997 us 1,023,057 us -0.9% +8.7%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,507.70,200,128000,394,0.240,24234.00,36574.80,36574.80
1,100,10,64,20,2455.00,2000,1280000,815,0.497,123217.59,137580.13,137580.13
2,1000,10,64,20,21621.11,20000,12800000,925,0.565,1082256.78,1111551.53,1111551.53

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.88%. Comparing base (357fed0) to head (5f9c481).
⚠️ Report is 9 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5856      +/-   ##
============================================
+ Coverage     53.64%   53.88%   +0.23%     
- Complexity     2720     2753      +33     
============================================
  Files          1099     1099              
  Lines         42541    42539       -2     
  Branches       4577     4577              
============================================
+ Hits          22822    22921      +99     
+ Misses        18385    18284     -101     
  Partials       1334     1334              
Flag Coverage Δ *Carryforward flag
access-control-service 70.44% <ø> (ø) Carriedforward from 357fed0
agent-service 34.36% <ø> (ø) Carriedforward from 357fed0
amber 55.18% <ø> (+0.60%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø) Carriedforward from 357fed0
config-service 56.71% <ø> (ø) Carriedforward from 357fed0
file-service 57.06% <ø> (ø) Carriedforward from 357fed0
frontend 48.09% <ø> (ø) Carriedforward from 357fed0
pyamber 90.13% <ø> (ø) Carriedforward from 357fed0
python 90.80% <ø> (ø) Carriedforward from 357fed0
workflow-compiling-service 58.69% <ø> (ø) Carriedforward from 357fed0

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bobbai00 bobbai00 marked this pull request as draft June 21, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SyncExecutionResource's result-readiness wait is redundant: the engine commits results before COMPLETED (not a sleep-vs-poll problem)

2 participants