From 8ba68e0365536caa04c302a36e412047285d33a4 Mon Sep 17 00:00:00 2001 From: giles Date: Sat, 4 Jul 2026 04:10:55 +0000 Subject: [PATCH] W14: F10 expected-failures baseline gate (test-only) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The OCaml suite's permanent ~273-failure band (in-progress hs-* + the r7rs radix shadow) is normalized, so real regressions hide in red noise (conformance.md F-10). A runner skip-list would rewrite the hs loops' scoreboards mid-flight — instead, pin the band: scripts/test-suite-baseline.sh runs the full suite and diffs its FAIL set against spec/tests/known-failures.txt (273 entries, identity = "suite > name", error text stripped). Red on a NEW failure (regression) AND red on a vanished failure (fix landed — delete it from the baseline, locking in the win). The band still prints as FAIL lines for the teams working through it; nothing in the runner changes. Bonus capture: 2 of the 273 have EMPTY suite labels (can-map-an-array, string->number) — live evidence for C9, the next checklist item. Validated end-to-end: GREEN on current tree (5800p/273f — 38 net passes above dc7aa709's 5762 from this loop's added pins). Runtime ~12 min. Test-only: no semantics edits, no push. Co-Authored-By: Claude Fable 5 --- plans/agent-briefings/sx-gate-loop.md | 17 +- scripts/test-suite-baseline.sh | 61 ++++++ spec/tests/known-failures.txt | 273 ++++++++++++++++++++++++++ 3 files changed, 350 insertions(+), 1 deletion(-) create mode 100755 scripts/test-suite-baseline.sh create mode 100644 spec/tests/known-failures.txt diff --git a/plans/agent-briefings/sx-gate-loop.md b/plans/agent-briefings/sx-gate-loop.md index a48620d3..a4b2adb8 100644 --- a/plans/agent-briefings/sx-gate-loop.md +++ b/plans/agent-briefings/sx-gate-loop.md @@ -94,7 +94,10 @@ Pin each confirmed-and-fixed finding with a minimal repro. Add suites to - [x] C3/C4/C5/C6/C7 — protocol-quirk ledger (pins current behavior, bidirectional) + seeded 60-line fuzz-liveness property in `scripts/test-protocol-gate.sh` (11/11) -- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something +- [x] F10 — expected-failures BASELINE GATE instead of a skip-list + (`scripts/test-suite-baseline.sh` + `spec/tests/known-failures.txt`, + 273 pinned: 271 hs-* + 2 empty-suite-label entries → C9 evidence). + New failure OR vanished failure = red; hs loops' scoreboards untouched - [ ] C9 — empty suite label ### F. Differential battery @@ -102,6 +105,18 @@ Pin each confirmed-and-fixed finding with a minimal repro. Add suites to ## Progress log (newest first) +- 2026-07-04 — **F10 baseline gate (item E.2)**. Deliberately NOT a + skip-list: skip-listing the hs red band in the runner would rewrite the + hs loops' scoreboards mid-flight. Instead + `scripts/test-suite-baseline.sh` diffs the full suite's FAIL set against + checked-in `spec/tests/known-failures.txt` (273 entries: 271 hs-* + 2 + with EMPTY suite labels — live C9 evidence, `can-map-an-array` "map with + block" and `string->number` 2-arg, the "r7rs radix shadow"). Red on a + NEW failure (regression) and red on a VANISHED failure (fix landed — + delete from baseline, locking in the win). Identity = "suite > name" + with error text stripped (messages churn). Current suite: 5800p/273f + (up 38 passes from dc7aa709's 5762 — sections A–D added pins). Validated + end-to-end: GREEN, exit 0, ~12 min runtime. Test-only. - 2026-07-04 — **C3–C7 protocol fuzz suite (item E.1)**. All five findings are still OPEN server-side (sx_server.ml fixes are host-runtime work), so the suite pins CURRENT behavior as a bidirectional ledger — verified diff --git a/scripts/test-suite-baseline.sh b/scripts/test-suite-baseline.sh new file mode 100755 index 00000000..4c51583d --- /dev/null +++ b/scripts/test-suite-baseline.sh @@ -0,0 +1,61 @@ +#!/bin/bash +# test-suite-baseline.sh — W14/F10: make FAIL mean something again. +# +# The review (conformance.md F-10): the OCaml suite is not green — a +# permanent ~274-failure band (in-progress hs-* + r7rs radix shadow) is +# normalized, so real regressions hide inside the red noise and nobody can +# tell a new failure from the band. +# +# This gate pins the band instead of ignoring it: the full suite's FAIL +# set is diffed against the checked-in baseline +# (spec/tests/known-failures.txt). Two red conditions, both loud: +# NEW failure -> a real regression: fix it (or, if intentional, +# justify + add to the baseline in the same commit) +# VANISHED failure -> something got fixed: delete it from the baseline +# so the win is locked in +# Neither touches the runner or the hs loops' scoreboards — the band still +# prints as FAIL lines for the teams working through it. +# +# Usage: bash scripts/test-suite-baseline.sh +# Runtime: full suite, ~5–15 min. Exit 0 = fail set identical to baseline. +set -uo pipefail +cd "$(dirname "$0")/.." + +RUNNER=hosts/ocaml/_build/default/bin/run_tests.exe +BASELINE=spec/tests/known-failures.txt +[[ -x "$RUNNER" ]] || { echo "SKIP: $RUNNER not built" >&2; exit 2; } +[[ -f "$BASELINE" ]] || { echo "SKIP: $BASELINE missing" >&2; exit 2; } + +log=$(mktemp) +timeout 3000 "$RUNNER" > "$log" 2>&1 +rc=$? +if [[ $rc -ne 0 && $rc -ne 1 ]]; then + echo "RED: runner exited $rc (timeout/crash)"; tail -5 "$log"; rm -f "$log"; exit 1 +fi + +# Normalize: keep the stable test identity (suite > name), drop messages +# (error text may contain addresses/timings that churn). +current=$(mktemp) +grep '^ FAIL: ' "$log" | sed 's/^ FAIL: //; s/: .*$//' | sort -u > "$current" + +new_failures=$(comm -13 <(sort -u "$BASELINE") "$current") +vanished=$(comm -23 <(sort -u "$BASELINE") "$current") + +summary=$(grep '^Results:' "$log" | tail -1) +red=0 +if [[ -n "$new_failures" ]]; then + echo "RED: NEW failures not in baseline:" + sed 's/^/ + /' <<<"$new_failures" + red=1 +fi +if [[ -n "$vanished" ]]; then + echo "RED: baseline entries now PASSING (delete them from $BASELINE):" + sed 's/^/ - /' <<<"$vanished" + red=1 +fi +if [[ $red -eq 0 ]]; then + echo "GREEN: fail set identical to baseline ($(wc -l < "$BASELINE") known failures)" +fi +echo "$summary" +rm -f "$log" "$current" +exit $red diff --git a/spec/tests/known-failures.txt b/spec/tests/known-failures.txt new file mode 100644 index 00000000..edcbfcf0 --- /dev/null +++ b/spec/tests/known-failures.txt @@ -0,0 +1,273 @@ + > can-map-an-array +hs-compat-asExpression > converts-a-complete-form-into-values +hs-compat-asExpression > converts-strings-into-fragments +hs-compat-asExpression > converts-value-as-json +hs-compat-in > basic-no-query-return-values +hs-compat-typecheck > can-do-basic-non-string-typecheck-failure +hs-compat-typecheck > can-do-basic-string-non-null-typecheck +hs-compat-typecheck > can-do-basic-string-typecheck +hs-compat-typecheck > null-causes-null-safe-string-check-to-fail +hs-dev-asExpression > parses string as JSON to object +hs-dev-collectionExpressions > where binds after property access +hs-dev-comparisonOperator > I am between works +hs-dev-comparisonOperator > I am not between works +hs-dev-comparisonOperator > is still does equality when rhs variable exists +hs-dev-pick > can pick first n items +hs-dev-pick > can pick items using 'of' syntax +hs-dev-pick > can pick last n items +hs-dev-pick > can pick random item +hs-dev-pick > can pick random n items +hs-emit-classes > remove class from target +hs-emit-control-flow > tell rebinds me +hs-emit-def-behavior > def becomes define +hs-emit-dom-commands > hide sets display none +hs-emit-dom-commands > log passes through +hs-emit-dom-commands > show clears display +hs-emit-on > on every click +hs-extra-function-call > identity-call +hs-extra-lambda > array-map-block +hs-extra-lambda > arrow-true +hs-extra-typecheck > null-colon-string +hs-parse-assignment > put into +hs-parse-assignment > set property to string +hs-parse-basic-commands > add class to me +hs-parse-basic-commands > remove class from me +hs-parse-basic-commands > toggle between two classes +hs-parse-basic-commands > toggle class on me +hs-parse-conditional > if else end +hs-parse-conditional > if then end +hs-parse-conformance > increment @count → full AST +hs-parse-conformance > on click add .called → full AST +hs-parse-conformance > on click from #bar add .clicked → full AST +hs-parse-conformance > toggle between .foo and .bar → full AST +hs-parse-conformance > wait 100ms then add .done → full AST +hs-parse-events > on click add class +hs-parse-events > on click from target +hs-parse-every-modifier > on every click +hs-parse-expressions > attribute ref +hs-parse-expressions > style ref +hs-parse-send-trigger > trigger event on me +hs-parse-sequencing > wait then add +hs-parse-special-commands > decrement attribute +hs-parse-special-commands > hide +hs-parse-special-commands > increment attribute +hs-parse-special-commands > show target +hs-parse-unary > not expr +hs-runtime-e2e > source → SX shape +hs-runtime-make > make Map returns dict +hs-runtime-make > make Set returns list +hs-tokenize-arithmetic-ops > division operator +hs-tokenize-arithmetic-ops > mixed arithmetic +hs-tokenize-arithmetic-ops > modulo operator +hs-tokenize-arithmetic-ops > multiply operator +hs-tokenize-basics > keywords vs identifiers +hs-tokenize-basics > whitespace skipped +hs-tokenize-comments > line comment skipped +hs-tokenize-full-expressions > if true put "foo" into me.innerHTML else put "bar" into me.innerHTML end +hs-tokenize-full-expressions > increment @count then put it into me +hs-tokenize-full-expressions > on click add .called +hs-tokenize-full-expressions > on click[buttons==0] log event +hs-tokenize-full-expressions > on click from #bar add .clicked +hs-tokenize-full-expressions > on click send custom(foo:"fromBar") to #d2 +hs-tokenize-full-expressions > put "Clicked" into my.innerHTML +hs-tokenize-full-expressions > set #d1.innerHTML to foo +hs-tokenize-full-expressions > toggle between .foo and .bar +hs-tokenize-full-expressions > wait 100ms then add .done +hs-upstream-add > can add a value to a set +hs-upstream-add > can add to an HTMLCollection +hs-upstream-add > can add to children +hs-upstream-add > can add to query in me +hs-upstream-add > supports async expressions in when clause +hs-upstream-append > append to undefined ignores the undefined +hs-upstream-append > can append a value to a DOM node +hs-upstream-append > can append a value to a set +hs-upstream-append > can append a value to I +hs-upstream-append > multiple appends work +hs-upstream-append > new DOM content added by append will be live +hs-upstream-askAnswer > confirm returns first choice on OK +hs-upstream-askAnswer > prompts and puts result in it +hs-upstream-call > call functions that return promises are waited on +hs-upstream-core/asyncError > rejected promise stops execution +hs-upstream-core/asyncError > rejected promise triggers catch block +hs-upstream-core/regressions > can invoke functions w/ numbers in name +hs-upstream-core/regressions > can pick detail fields out by name +hs-upstream-core/regressions > can refer to function in init blocks +hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly +hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly w/ of +hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly w/ possessives +hs-upstream-core/runtimeErrors > reports null errors on add command properly +hs-upstream-core/runtimeErrors > reports null errors on decrement command properly +hs-upstream-core/runtimeErrors > reports null errors on default command properly +hs-upstream-core/runtimeErrors > reports null errors on hide command properly +hs-upstream-core/runtimeErrors > reports null errors on increment command properly +hs-upstream-core/runtimeErrors > reports null errors on measure command properly +hs-upstream-core/runtimeErrors > reports null errors on put command properly +hs-upstream-core/runtimeErrors > reports null errors on remove command properly +hs-upstream-core/runtimeErrors > reports null errors on send command properly +hs-upstream-core/runtimeErrors > reports null errors on sets properly +hs-upstream-core/runtimeErrors > reports null errors on settle command properly +hs-upstream-core/runtimeErrors > reports null errors on show command properly +hs-upstream-core/runtimeErrors > reports null errors on toggle command properly +hs-upstream-core/runtimeErrors > reports null errors on transition command properly +hs-upstream-core/runtimeErrors > reports null errors on trigger command properly +hs-upstream-core/runtime > has proper stack from event handler +hs-upstream-core/scoping > locally scoped variables don't clash with built-in variables +hs-upstream-empty > can empty a map +hs-upstream-empty > can empty an element +hs-upstream-empty > can empty a set +hs-upstream-empty > clear works on elements +hs-upstream-expressions/asExpression > can accept custom dynamic conversions +hs-upstream-expressions/asExpression > can use the a modifier if you like +hs-upstream-expressions/asExpression > collects duplicate text inputs into an array +hs-upstream-expressions/asExpression > converts a complete form into Values +hs-upstream-expressions/asExpression > converts a form element into Values +hs-upstream-expressions/asExpression > converts a form element into Values | FormEncoded +hs-upstream-expressions/asExpression > converts a form element into Values | JSONString +hs-upstream-expressions/asExpression > converts an element into HTML +hs-upstream-expressions/asExpression > converts a NodeList into HTML +hs-upstream-expressions/asExpression > converts array as Set +hs-upstream-expressions/asExpression > converts checkboxes into a Value correctly +hs-upstream-expressions/asExpression > converts multiple selects into a Value correctly +hs-upstream-expressions/asExpression > converts multiple selects with programmatically changed selections +hs-upstream-expressions/asExpression > converts object as Map +hs-upstream-expressions/asExpression > converts radio buttons into a Value correctly +hs-upstream-expressions/asExpression > converts value as Date +hs-upstream-expressions/asExpression > parses string as JSON to object +hs-upstream-expressions/asExpression > pipe operator chains conversions +hs-upstream-expressions/blockLiteral > basic block literals work +hs-upstream-expressions/blockLiteral > basic identity works +hs-upstream-expressions/blockLiteral > basic two arg identity works +hs-upstream-expressions/closest > closest does not consume a following where clause +hs-upstream-expressions/comparisonOperator > does not exist works +hs-upstream-expressions/cookies > basic clear cookie values work +hs-upstream-expressions/cookies > basic set cookie values work +hs-upstream-expressions/cookies > iterate cookies values work +hs-upstream-expressions/cookies > length is 0 when no cookies are set +hs-upstream-expressions/cookies > update cookie values work +hs-upstream-expressions/functionCalls > can access a property of a call's result +hs-upstream-expressions/functionCalls > can chain calls on the result of a call +hs-upstream-expressions/functionCalls > can invoke function on object +hs-upstream-expressions/functionCalls > can invoke function on object w/ async arg +hs-upstream-expressions/functionCalls > can invoke function on object w/ async root & arg +hs-upstream-expressions/functionCalls > can invoke global function +hs-upstream-expressions/functionCalls > can invoke global function w/ async arg +hs-upstream-expressions/functionCalls > can pass an array literal as an argument +hs-upstream-expressions/functionCalls > can pass an expression as an argument +hs-upstream-expressions/functionCalls > can pass an object literal as an argument +hs-upstream-expressions/functionCalls > can pass no arguments +hs-upstream-expressions/logicalOperator > and short-circuits when lhs promise resolves to false +hs-upstream-expressions/logicalOperator > should short circuit with and expression +hs-upstream-expressions/logicalOperator > should short circuit with or expression +hs-upstream-expressions/mathOperator > can use mixed expressions +hs-upstream-expressions/objectLiteral > expressions work in object literal field names +hs-upstream-expressions/propertyAccess > property access on function result +hs-upstream-expressions/some > some returns true for nonempty selector +hs-upstream-expressions/strings > string templates work w/ props +hs-upstream-expressions/strings > string templates work w/ props w/ braces +hs-upstream-expressions/symbol > resolves global context properly +hs-upstream-fetch > allows the event handler to change the fetch parameters +hs-upstream-fetch > as response does not throw on 404 +hs-upstream-fetch > can catch an error that occurs when using fetch +hs-upstream-fetch > can do a simple fetch +hs-upstream-fetch > can do a simple fetch w/ a custom conversion +hs-upstream-fetch > can do a simple fetch w/ a naked URL +hs-upstream-fetch > can do a simple fetch w/ html +hs-upstream-fetch > can do a simple fetch w/ json +hs-upstream-fetch > can do a simple fetch w/ json using JSON syntax +hs-upstream-fetch > can do a simple fetch w/ json using Object syntax +hs-upstream-fetch > can do a simple fetch w/ json using Object syntax and an 'an' prefix +hs-upstream-fetch > can do a simple post +hs-upstream-fetch > can do a simple post alt syntax w/ curlies +hs-upstream-fetch > can do a simple post alt syntax without curlies +hs-upstream-fetch > can put response conversion after with +hs-upstream-fetch > can put response conversion before with +hs-upstream-fetch > do not throw passes through 404 response +hs-upstream-fetch > don't throw passes through 404 response +hs-upstream-fetch > Response can be converted to JSON via as JSON +hs-upstream-fetch > submits the fetch parameters to the event handler +hs-upstream-fetch > throws on non-2xx response by default +hs-upstream-fetch > triggers an event just before fetching +hs-upstream-hide > can hide element, with display:none by default +hs-upstream-hide > can hide element with display:none explicitly +hs-upstream-hide > can hide element with no target followed by command +hs-upstream-hide > can hide element with no target followed by then +hs-upstream-hide > can hide element with no target with a with +hs-upstream-hide > can hide element with opacity:0 +hs-upstream-hide > can hide element with opacity style literal +hs-upstream-hide > can hide element, with visibility:hidden +hs-upstream-hide > can hide other elements +hs-upstream-if > if on new line does not join w/ else +hs-upstream-if > if properly supports nested if statements and end block +hs-upstream-js > can do both of the above +hs-upstream-js > can return values to _hyperscript +hs-upstream-js > handles rejected promises without hanging +hs-upstream-make > can make elements +hs-upstream-make > can make elements with id and classes +hs-upstream-make > can make named objects +hs-upstream-make > can make named objects w/ global scope +hs-upstream-make > can make named objects with arguments +hs-upstream-make > can make objects +hs-upstream-make > can make objects with arguments +hs-upstream-make > creates a div by default +hs-upstream-on > can catch exceptions thrown in hyperscript functions +hs-upstream-on > can catch exceptions thrown in js functions +hs-upstream-on > can ignore when target doesn't exist +hs-upstream-on > can pick detail fields out by name +hs-upstream-on > can pick event properties out by name +hs-upstream-on > listeners on other elements are removed when the registering element is removed +hs-upstream-on > multiple event handlers at a time are allowed to execute with the every keyword +hs-upstream-on > on intersection fires when the element is in the viewport +hs-upstream-on > rethrown exceptions trigger 'exception' event +hs-upstream-on > throttled at