-
Notifications
You must be signed in to change notification settings - Fork 476
reanalyze: add parallel processing for CMT file analysis #8089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Remove global ModulePath mutable state used during per-file AST traversal by threading ModulePath.t explicitly through traversal code. This is preparation for parallel per-file processing and reduces shared global state during MAP. Signed-Off-By: Cristiano Calcagno <[email protected]>
Record type dependency edges immediately instead of buffering them in a global ref and flushing at end-of-file. Note: this reorders debug output in reanalyze logs: addTypeReference lines now appear next to extendTypeDependencies, rather than being emitted in a later end-of-file flush. Signed-Off-By: Cristiano Calcagno <[email protected]>
Stop using a global TypeLabels hashtable during AST processing. Instead, compute type-label dependencies in a post-merge pass from merged Declarations and add the corresponding type-reference edges before solving. Fix: use raw decl positions (not declGetLoc/posAdjustment) when building cross-file label indices, since reference graph keys are raw positions. Note: debug output in deadcode expected logs is reordered due to moving label linking from per-file traversal to the post-merge pass. Signed-Off-By: Cristiano Calcagno <[email protected]>
Stop using global mutable exception declaration state during AST processing. Instead, derive a pure exception lookup from merged Declarations and use it when processing CrossFileItems exception refs. Signed-Off-By: Cristiano Calcagno <[email protected]>
- Replace global Values.valueBindingsTable with per-file values_builder - Replace global Checks.checks with per-file checks_builder - processCmt now returns file_result with per-file data - Add runChecks to process all checks after merging values tables - Update Reanalyze.ml to collect exception results and run checks at end - This enables future parallelization of AST processing
Add support for parallel processing of CMT files during dead code and exception analysis using OCaml 5 Domains. Features: - New -parallel N flag: process files using N domains (0 = sequential) - -parallel -1: auto-detect number of CPU cores - New -timing flag: report internal timing breakdown by phase - New benchmark infrastructure for performance testing The parallelization targets the CMT file processing phase, which is the main bottleneck (typically 75-80% of analysis time). The analysis and reporting phases remain sequential as they require merged cross-file data. Signed-Off-By: Cursor AI <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Break down timing into sub-phases: - CMT processing: - File loading: time spent reading and traversing CMT files - Result collection: time spent merging results from domains - Analysis: - Merging: combining per-file data into global structures - Solving: running the liveness solver Signed-Off-By: Cursor AI <[email protected]>
Updated Timing Breakdown (50 copies, ~4900 files, 12 cores)Added granular timing that separates sub-phases: Sequential ModeParallel Mode (auto-detect = 12 cores)Key Observations
|
|
@cknitt to test, and eventually enable, parallelism, we need to build with ocaml 5. |
I kept the older versions in CI to make sure that we don't break compatibility inadvertently, so that we can support a broad version range when we eventually publish the compiler libs on OPAM. But enabling parallelism is a good reason to move to OCaml 5 only, so this is fine with me. |
What's the earliest version we should keep? |
I would say the earliest we can based on what features/lib you require for parallelism. |
Summary
Add support for parallel processing of CMT files during dead code and exception analysis using OCaml 5 Domains.
Features
-parallel N: Process files using N parallel domains (0 = sequential, default)-parallel -1: Auto-detect number of CPU cores usingDomain.recommended_domain_count()-timing: Report internal timing breakdown by analysis phaseHow It Works
The parallelization targets the CMT file processing phase, which is the main bottleneck (typically 75-80% of total analysis time).
Parallel execution model:
Domain.join), the merged results are passed to the analysis phaseThe mutex is only held during the merge operation (list concatenation), not during file processing, which minimizes contention.
The analysis and reporting phases remain sequential as they require merged cross-file data.
Testing
Correctness Tests
Benchmark Tests
Observed Results (50 copies, ~4300 files, 12 cores)
Timing Breakdown
Key Observations
Usage Examples
Sample Timing Output