Parallel per-file type checking with concurrent project check#861
Parallel per-file type checking with concurrent project check#861michaelglass wants to merge 3 commits into
Conversation
e9e2506 to
c7d4473
Compare
|
@michaelglass hey thanks, do you mind fixing CI? But please fix it so that each commit has a green CI status. |
c7d4473 to
9677c09
Compare
fbaefef to
ee21876
Compare
e4b874f to
c30df05
Compare
Split asyncLintProject into separate MSBuild loading and lint phases so callers can load project options sequentially (avoiding Ionide.ProjInfo deadlocks) while running FCS type-check + lint rules in parallel. - Add getProjectOptions: loads MSBuild project info, returns FSharpProjectOptions - Add asyncLintProjectOptions: lints with pre-loaded options, skips MSBuild - Add Checker option to OptionalLintParameters for sharing FSharpChecker - Refactor asyncLintProject to compose the two new functions - Run project-level and per-file type checking concurrently - Compute enabledRules once per project instead of per file (lintWithRules) - Update global.json to .NET 10 SDK - Suppress NETSDK1188 (FSharp.Core locale resource warning) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit bafeb43)
Combines the WorkspaceLoader and FSharpChecker sharing from the two
sibling branches. asyncLintSolution now:
- Creates one WorkspaceLoader and calls LoadProjects once with every
project in the solution (one warm MSBuild engine, one project-
graph evaluation).
- Creates one FSharpChecker and injects it via
OptionalLintParameters.Checker so every per-project call reuses
its parse/typecheck caches.
- Maps each project's FSharpProjectOptions with a singleton
known-set (mapToFSharpProjectOptions proj [proj]) so FCS resolves
P2P references against compiled DLLs.
The singleton known-set matters when the checker is shared: passing the
full known-set makes FCS treat referenced projects as source projects,
and a shared checker will re-type-check referenced project sources per
dependent. In a nested solution this flips the combined change from a
win to a regression. Isolated mapping preserves the DLL-based
resolution that single-project loading happens to produce.
Measured on FsHotWatch solution (11 fsproj, ~80 files), M-series Mac,
3 hyperfine runs + 1 warmup:
branch baseline 42.3s +/- 6.2s
+ share WorkspaceLoader only 39.9s +/- 7.7s (-5.7%)
+ share FSharpChecker only 35.1s +/- 1.6s (-17.0%)
+ both, full-known-set mapping 57.0s +/- 7.4s (regression)
+ both, singleton-known-set mapping 19.9s +/- 0.2s (-52.9%, 2.13x)
Against the published dotnet-fsharplint 0.26.10 on the same solution
(515s +/- 121s), the combined patch is ~25.9x faster.
An isolation probe (12-project loop) confirms where the savings come
from:
- Per-project WorkspaceLoader pays ~14.7s of redundant MSBuild
engine cold-starts over 11 projects; single LoadProjects(all) saves
another ~8.5s through MSBuild's internal project-graph evaluation
cache.
- Per-project FSharpChecker pays ~9s of redundant reference
resolution and symbol-table loading.
The two effects partially overlap: a per-project FSharpChecker does
MSBuild-adjacent reference work that masks some of the
WorkspaceLoader-sharing savings on the wall clock. Sharing only the
loader gains 5.7% because the checker reabsorbs the freed budget;
sharing both lets the savings materialize fully.
Also bumps FSharp.Core from 10.1.201 to 10.1.202 to match FCS 43.12.202.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 3928a19)
c30df05 to
78f134b
Compare
|
well michaelglass@7fc3684 is green, but not in this PR? 🤷 |
|
I'm going to also benchmark this against non-pathological case to see how it performs as well. I know when I did this initially there was a minor regression (single-digit percentage) |
|
It's not that it's not green in the PR, it's that in the PR it's missing CI status because you didn't push 1 by 1 :) I made a script for that: https://github.com/tarsgate/conventions/blob/master/scripts/gitPush1by1.fsx . But don't worry, I know it's a hassle, I can take care of it this weekend. |
|
ok can go back and push one by one again. no stress |
78f134b to
ee21876
Compare
|
I stand corrected, what must be actually happening is that, despite you having pushed 1 by 1, as you're a first time contributor, I need to click "approval to run CI" everytime you push. Hopefully this doesn't happen anymore after your first PR lands. |
|
ok done. no serious regression! |
|
added the benchmark scrip to the gist |
sorry, force pushed so need to open a new PR for #846
this does help for projects with lots of local dependencies. E.g. fsharp monorepos with multiple dependent projects, for instance https://github.com/michaelglass/fshotwatch
here's my [claude's] benchmark script
results:
pathological target -- linting FSHotWatch
91fd00b0, incl. #845)ee218763)78f134b3)Non-pathological target — linting FSharpLint's own solution (flat dependency graph):
0.27(released)91fd00b0)ee218763)78f134b3)What makes this pathological / what the speedup is conditional on
The perf improvement scales with a solution of many local subprojects that reference each other. FsHotWatch is a good example: 13 projects, with the test/CLI projects each referencing 8–9 of the others.
On master, every project is linted in isolation — a fresh MSBuild design-time load + FCS re-type-checking that project's whole project-to-project (P2P) dependency closure. So each shared library project gets re-loaded and re-type-checked once per dependent: duplicated work that grows with the density of the dependency graph.
The two changes remove different parts of that cost:
WorkspaceLoader(MSBuild design-time build runs once for the whole solution, not per project) + one sharedFSharpChecker(each project and its dependencies type-checked once, cached, reused by every dependent) — conditional on many cross-referencing local subprojects.the ~33× is conditional on lots of inter-dependent local subprojects.
Caveat
On a single project, or a flat solution whose projects share no dependencies, the sharing win largely disappears and you're left with roughly the parallel-per-file effect; on a single small project the parallelization overhead can even make it marginally slower.