Fix quotations leaking empty-string match lowering into AST (#19873)#19923
Open
T-Gro wants to merge 12 commits into
Open
Fix quotations leaking empty-string match lowering into AST (#19873)#19923T-Gro wants to merge 12 commits into
T-Gro wants to merge 12 commits into
Conversation
…clean The empty-string lowering added in #19189 produced `if x <> null then x.Length = 0 else false` inside `BuildSwitch`, which leaked through pattern-match compilation into captured quotations (#19873). Move the rewrite to `OptimizeExprOp`, which already skips `Expr.Quote` bodies. `match s with "" -> _` keeps the same IL and quotations now see `op_Equality(s, "")`. As a side effect, `if s = "" then _` gets the same null-safe length-check IL — improvement, not regression. Also adds a generic `verifyOutputAgainstBaseline` helper in `Compiler.fs` and a `.bsl`-driven quotation rendering test suite under `Conformance/Expressions/ ExpressionQuotations/QuotationRendering/` to catch future leaks of pattern-match lowerings into quotation ASTs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adversarial reviews flagged two IL-quality concerns. Verified both by
inspecting actual ildasm output, accepted both, and added IL tests
locking the current behavior in so future changes are conscious:
1. --optimize- (debug) builds no longer get the null+Length fast path
for `match s with "" -> _` because F#'s `(=)` is only inlined when
LocalOptimizationsEnabled is true. The fallback `String.Equals(s,"")`
call is still correct; JIT tiered compilation reaches the fast path.
2. `match s with null | "" -> _` emits one redundant `brfalse` on the
empty-string branch because the optimizer cannot see the enclosing
null-filtered context that BuildSwitch's `isNullFiltered` flag tracked.
JIT trivially eliminates the redundant branch.
Also tightens `IsILMethodRefSystemStringEquals` to require the call to
be static (rejects hypothetical instance/user-defined `System.String`
type with a 2-arg static `Equals`) and fixes a stale function-name
comment in PatternMatchCompilation.fs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Driver: extract `printerProgram` helper using a triple-quoted F# string (no more `\"\"` escape hell); group tests into Regression / NoLeakReference / SideEffect / PreExistingError submodules. Baselines: drop the `Match` suffix (folder is already `QuotationRendering`), delete redundant `SparseIntMatch` (same code path as Consecutive), shrink all multi-case tests to the minimum that still demonstrates the point: - ConsecutiveInts: 1..10 → 1..3 (was 41 lines, now 7) - Chars, NonEmptyString: 3 → 2 cases - Int64, Float, Decimal: 3 → 1 case Result: 9 baselines totalling 37 lines, each fits on screen. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The empty-string regression tests and surrounding trade-off comments grew beyond what they need to convey. This trims prose to the causal mechanism only, flattens four banner-only sub-modules into one, collapses the five no-leak reference tests into a parametrised Theory, and corrects a stale docstring claim that Int64/Float/Decimal each take a distinct BuildSwitch arm (Int64 and Float share the mkILAsmCeq path, so Float.bsl was redundant with Int64.bsl and is removed). Release note shrinks from a paragraph to the one user-observable change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-architect review Three rounds of adversarial multi-model (gpt-5.5, opus-4.7-xhigh, opus-4.8) architect review converged on a cleaner design for the empty-string regression coverage: - Quotation literals now live directly in test method bodies (`<@ … @>`) instead of inside a generated FSI script. They are desugared at test-project compile time by the proto fsc, which converts a runtime baseline diff into a compile-time gate: if #19873 ever regresses, the test source itself fails to compile with FS0452. - Removes the shared-FSI session, the no-op `RunTestCasesInSequence` cargo cult, the string-templated printer with its triple-quote escape guard, and the unused `verifyOutputAgainstBaseline` helper this PR briefly introduced. - Baselines carry a `// <name>` provenance header so a `.bsl` opened in isolation identifies itself in PR diffs. - Adds an orphan-guard `[<Fact>]` that fails when `*.bsl` on disk drifts from the test method set (skipped during `TEST_UPDATE_BSL=1` to avoid racing with writes). - Adds a structural Expr walker as belt-and-suspenders alongside the baseline for the two known leaked-lowering shapes (`String.Length`, `String.Equals`). - Adds a convergence assertion that the `match x with ""` and `if x = ""` quotations desugar to byte-identical Exprs, using shared module-level let bindings so the proof and the baselines cannot drift apart. - Splits the FS0452 array-pattern diagnostic test into its own file since it uses a different harness (`FSharp |> typecheck |> shouldFail`, not a `.bsl` snapshot). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
`tname_String`, `tname_Bool`, `tname_Type` already exist in `src/Compiler/AbstractIL/il.fs`
as `[<Literal>]` constants alongside ~20 other type-name literals, but they were not
exposed via `il.fsi` so callers in `Optimize/Optimizer.fs` and `Symbols/Exprs.fs` had
been duplicating the magic strings (`"System.String"`, `"System.Boolean"`, `"System.Type"`).
Exposes the three names actually referenced outside `il.fs` via `il.fsi` and switches
the four affected `IL{TypeRef,MethodRef}.Name` comparisons to use them. No behavior
change. The fix applies to the empty-string-equals matcher this PR added plus the
pre-existing `String.Concat` / `String.GetHashCode` / `Type.GetTypeFromHandle` matchers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5-line walker docstring, 3-line section-divider blocks, 6-line bullet list above the no-leak reference Facts, and a 12-line "Design notes" header were rationalising self-evident code or restating test names. Trimmed to the essentials: top-of-file purpose + bootstrap caveat, one-line note next to the shared `let` bindings, and the one genuinely-non-obvious comment (Int64 takes the mkILAsmCeq arm). Net effect: file shrinks from 156 to 100 lines, signal density up, tests unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
It tested filesystem bookkeeping (does *.bsl on disk equal a hand-maintained expectedBaselines set), not the compiler. The drift it catches — a deleted test leaving its .bsl behind — is already visible in `git diff` and adds zero signal anyone would actually use to find a regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three garbage hunters (production / tests / prose) found redundancies that
slipped through the earlier iterations:
Production:
- Merge two String.Equals(_, "")/("", _) optimizer arms into one or-pattern.
- Drop the redundant CallingConv.IsStatic guard in IsILMethodRefSystemStringEquals
(ArgCount=2 + both args String + return Bool already pin the static overload).
- Inline single-use let bindings in MakeOptimizedStringEqualsEmptyCall.
- Shrink 4-line docstring to 2 lines; shrink match-arm comment to one line.
- Shrink il.fsi block comment and PatternMatchCompilation.fs trade-off comment.
Tests:
- Delete QuotationUnsupportedConstructsTests.fs (FS0452 array-pattern coverage
already exists in Regressions/E_DecomposeArray01.fs — pure duplication).
- Delete IfEqualEmpty.bsl + its test + the convergence Fact + the shared `let`
bindings + assertNoEmptyStringLowering walker; all duplicated EmptyString.bsl
coverage either as identical AST or as belt-and-suspenders restating the
baseline diff.
- Strip the `// <name>` provenance header from the printer and from all 7
remaining baselines — the test method name already identifies each baseline.
- Trim trade-off comment tails in TypeTestsInPatternMatching.fs (drop the
JIT-folds-the-duplicate / tiered-compilation trivia).
Release notes: drop the implementation-detail bullet about `if s = ""` getting
the same null-safe IL — implied by moving the rewrite to the optimizer.
Net: -115/+30 across 16 files. 14 tests still pass; build clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extracts a `(|StringTy|_|)` partial active pattern over `ILType` and a small `isILMethodRefOnSystemString` helper that bakes in `tname_String` as the declaring type and takes the per-arg-shape check as an `[<InlineIfLambda>]`. Each `IsILMethodRefSystem*` now reads as the literal shape it matches: `[StringTy; StringTy]` for Equals(string, string); the three overload arities of String.Concat as alternative list patterns; `[ILType.Array (shape, StringTy)]` for Concat(string[]). No `ArgCount` book-keeping or `List.forall` walks left. Behaviour unchanged: build clean, 14 quotation + IL tests still green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
❗ Release notes requiredYou can open this PR in browser to add release notes: open in github.dev
Warning No PR link found in some release notes, please consider adding it.
|
abonie
reviewed
Jun 10, 2026
| let finalTest = if isNullFiltered then test else mkLazyAnd g m (mkNonNullTest g m vExpr) test | ||
| mkLetBind m bind finalTest | ||
| | DecisionTreeTest.Const (Const.String _ as c) -> | ||
| // Empty-string is rewritten to a null-safe length check by the Optimizer (#19873). |
Member
There was a problem hiding this comment.
I'd remove this comment, seems out of place
abonie
approved these changes
Jun 10, 2026
…a FSI - PatternMatchCompilation.fs: remove the misplaced `// Empty-string is rewritten…` comment on the (now-pristine) string arm (review feedback from @abonie). - QuotationRenderingTests.fs: replace literal `<@…@>` quotations with source-string evaluation through a shared FSI session. The bootstrap fsc that builds the test project still has the pre-fix desugar and rejects literal `match s with ""` quotations with FS0452 — the literals only become valid AFTER this PR is in the bootstrap. FSI uses the just-built FCS (with the fix), so source-as-string evaluation is bootstrap-immune. The .bsl baselines are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #19873.
#19189 added an empty-string fast path inside
BuildSwitch, replacing the equality test withif s <> null then s.Length = 0 else false. Because the rewrite happened during pattern-match compilation — which runs before the quotation translator — every quoted form involvingmatch s with "" -> _started leaking that lowering into its AST:That broke quotation consumers (translators, target-language compilers) that assume the quoted tree mirrors the source — see the issue for the regression scenario.
The fix moves the rewrite out of pattern-match compilation and into
OptimizeExprOp, which already skipsExpr.Quotebodies.match s with "" -> _now reaches the optimizer as a plainop_Equalitycall, so the quoted form is back to:The optimized IL for compiled code is unchanged. As a side effect,
if s = "" then _now produces the same null-safe length-check IL asmatch s with "" -> _— improvement, not regression. Two accepted IL trade-offs are pinned by baseline tests:--optimize-no longer collapses to the null+Length form ((=)is only inlined whenLocalOptimizationsEnabled), andmatch s with null | "" -> _emits one redundantbrfalse.sthe JIT folds.