W14: F10 expected-failures baseline gate (test-only)

The OCaml suite's permanent ~273-failure band (in-progress hs-* + the
r7rs radix shadow) is normalized, so real regressions hide in red noise
(conformance.md F-10). A runner skip-list would rewrite the hs loops'
scoreboards mid-flight — instead, pin the band:

scripts/test-suite-baseline.sh runs the full suite and diffs its FAIL set
against spec/tests/known-failures.txt (273 entries, identity =
"suite > name", error text stripped). Red on a NEW failure (regression)
AND red on a vanished failure (fix landed — delete it from the baseline,
locking in the win). The band still prints as FAIL lines for the teams
working through it; nothing in the runner changes.

Bonus capture: 2 of the 273 have EMPTY suite labels (can-map-an-array,
string->number) — live evidence for C9, the next checklist item.

Validated end-to-end: GREEN on current tree (5800p/273f — 38 net passes
above dc7aa709's 5762 from this loop's added pins). Runtime ~12 min.

Test-only: no semantics edits, no push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-04 04:10:55 +00:00
parent ca4ad404f1
commit 8ba68e0365
3 changed files with 350 additions and 1 deletions

View File

@@ -94,7 +94,10 @@ Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
- [x] C3/C4/C5/C6/C7 — protocol-quirk ledger (pins current behavior,
bidirectional) + seeded 60-line fuzz-liveness property in
`scripts/test-protocol-gate.sh` (11/11)
- [ ] F10 — hs-upstream skip-list so browser-only FAILs mean something
- [x] F10 — expected-failures BASELINE GATE instead of a skip-list
(`scripts/test-suite-baseline.sh` + `spec/tests/known-failures.txt`,
273 pinned: 271 hs-* + 2 empty-suite-label entries → C9 evidence).
New failure OR vanished failure = red; hs loops' scoreboards untouched
- [ ] C9 — empty suite label
### F. Differential battery
@@ -102,6 +105,18 @@ Pin each confirmed-and-fixed finding with a minimal repro. Add suites to
## Progress log (newest first)
- 2026-07-04 — **F10 baseline gate (item E.2)**. Deliberately NOT a
skip-list: skip-listing the hs red band in the runner would rewrite the
hs loops' scoreboards mid-flight. Instead
`scripts/test-suite-baseline.sh` diffs the full suite's FAIL set against
checked-in `spec/tests/known-failures.txt` (273 entries: 271 hs-* + 2
with EMPTY suite labels — live C9 evidence, `can-map-an-array` "map with
block" and `string->number` 2-arg, the "r7rs radix shadow"). Red on a
NEW failure (regression) and red on a VANISHED failure (fix landed —
delete from baseline, locking in the win). Identity = "suite > name"
with error text stripped (messages churn). Current suite: 5800p/273f
(up 38 passes from dc7aa709's 5762 — sections AD added pins). Validated
end-to-end: GREEN, exit 0, ~12 min runtime. Test-only.
- 2026-07-04 — **C3C7 protocol fuzz suite (item E.1)**. All five findings
are still OPEN server-side (sx_server.ml fixes are host-runtime work),
so the suite pins CURRENT behavior as a bidirectional ledger — verified

61
scripts/test-suite-baseline.sh Executable file
View File

@@ -0,0 +1,61 @@
#!/bin/bash
# test-suite-baseline.sh — W14/F10: make FAIL mean something again.
#
# The review (conformance.md F-10): the OCaml suite is not green — a
# permanent ~274-failure band (in-progress hs-* + r7rs radix shadow) is
# normalized, so real regressions hide inside the red noise and nobody can
# tell a new failure from the band.
#
# This gate pins the band instead of ignoring it: the full suite's FAIL
# set is diffed against the checked-in baseline
# (spec/tests/known-failures.txt). Two red conditions, both loud:
# NEW failure -> a real regression: fix it (or, if intentional,
# justify + add to the baseline in the same commit)
# VANISHED failure -> something got fixed: delete it from the baseline
# so the win is locked in
# Neither touches the runner or the hs loops' scoreboards — the band still
# prints as FAIL lines for the teams working through it.
#
# Usage: bash scripts/test-suite-baseline.sh
# Runtime: full suite, ~515 min. Exit 0 = fail set identical to baseline.
set -uo pipefail
cd "$(dirname "$0")/.."
RUNNER=hosts/ocaml/_build/default/bin/run_tests.exe
BASELINE=spec/tests/known-failures.txt
[[ -x "$RUNNER" ]] || { echo "SKIP: $RUNNER not built" >&2; exit 2; }
[[ -f "$BASELINE" ]] || { echo "SKIP: $BASELINE missing" >&2; exit 2; }
log=$(mktemp)
timeout 3000 "$RUNNER" > "$log" 2>&1
rc=$?
if [[ $rc -ne 0 && $rc -ne 1 ]]; then
echo "RED: runner exited $rc (timeout/crash)"; tail -5 "$log"; rm -f "$log"; exit 1
fi
# Normalize: keep the stable test identity (suite > name), drop messages
# (error text may contain addresses/timings that churn).
current=$(mktemp)
grep '^ FAIL: ' "$log" | sed 's/^ FAIL: //; s/: .*$//' | sort -u > "$current"
new_failures=$(comm -13 <(sort -u "$BASELINE") "$current")
vanished=$(comm -23 <(sort -u "$BASELINE") "$current")
summary=$(grep '^Results:' "$log" | tail -1)
red=0
if [[ -n "$new_failures" ]]; then
echo "RED: NEW failures not in baseline:"
sed 's/^/ + /' <<<"$new_failures"
red=1
fi
if [[ -n "$vanished" ]]; then
echo "RED: baseline entries now PASSING (delete them from $BASELINE):"
sed 's/^/ - /' <<<"$vanished"
red=1
fi
if [[ $red -eq 0 ]]; then
echo "GREEN: fail set identical to baseline ($(wc -l < "$BASELINE") known failures)"
fi
echo "$summary"
rm -f "$log" "$current"
exit $red

View File

@@ -0,0 +1,273 @@
> can-map-an-array
hs-compat-asExpression > converts-a-complete-form-into-values
hs-compat-asExpression > converts-strings-into-fragments
hs-compat-asExpression > converts-value-as-json
hs-compat-in > basic-no-query-return-values
hs-compat-typecheck > can-do-basic-non-string-typecheck-failure
hs-compat-typecheck > can-do-basic-string-non-null-typecheck
hs-compat-typecheck > can-do-basic-string-typecheck
hs-compat-typecheck > null-causes-null-safe-string-check-to-fail
hs-dev-asExpression > parses string as JSON to object
hs-dev-collectionExpressions > where binds after property access
hs-dev-comparisonOperator > I am between works
hs-dev-comparisonOperator > I am not between works
hs-dev-comparisonOperator > is still does equality when rhs variable exists
hs-dev-pick > can pick first n items
hs-dev-pick > can pick items using 'of' syntax
hs-dev-pick > can pick last n items
hs-dev-pick > can pick random item
hs-dev-pick > can pick random n items
hs-emit-classes > remove class from target
hs-emit-control-flow > tell rebinds me
hs-emit-def-behavior > def becomes define
hs-emit-dom-commands > hide sets display none
hs-emit-dom-commands > log passes through
hs-emit-dom-commands > show clears display
hs-emit-on > on every click
hs-extra-function-call > identity-call
hs-extra-lambda > array-map-block
hs-extra-lambda > arrow-true
hs-extra-typecheck > null-colon-string
hs-parse-assignment > put into
hs-parse-assignment > set property to string
hs-parse-basic-commands > add class to me
hs-parse-basic-commands > remove class from me
hs-parse-basic-commands > toggle between two classes
hs-parse-basic-commands > toggle class on me
hs-parse-conditional > if else end
hs-parse-conditional > if then end
hs-parse-conformance > increment @count → full AST
hs-parse-conformance > on click add .called → full AST
hs-parse-conformance > on click from #bar add .clicked → full AST
hs-parse-conformance > toggle between .foo and .bar → full AST
hs-parse-conformance > wait 100ms then add .done → full AST
hs-parse-events > on click add class
hs-parse-events > on click from target
hs-parse-every-modifier > on every click
hs-parse-expressions > attribute ref
hs-parse-expressions > style ref
hs-parse-send-trigger > trigger event on me
hs-parse-sequencing > wait then add
hs-parse-special-commands > decrement attribute
hs-parse-special-commands > hide
hs-parse-special-commands > increment attribute
hs-parse-special-commands > show target
hs-parse-unary > not expr
hs-runtime-e2e > source → SX shape
hs-runtime-make > make Map returns dict
hs-runtime-make > make Set returns list
hs-tokenize-arithmetic-ops > division operator
hs-tokenize-arithmetic-ops > mixed arithmetic
hs-tokenize-arithmetic-ops > modulo operator
hs-tokenize-arithmetic-ops > multiply operator
hs-tokenize-basics > keywords vs identifiers
hs-tokenize-basics > whitespace skipped
hs-tokenize-comments > line comment skipped
hs-tokenize-full-expressions > if true put "foo" into me.innerHTML else put "bar" into me.innerHTML end
hs-tokenize-full-expressions > increment @count then put it into me
hs-tokenize-full-expressions > on click add .called
hs-tokenize-full-expressions > on click[buttons==0] log event
hs-tokenize-full-expressions > on click from #bar add .clicked
hs-tokenize-full-expressions > on click send custom(foo:"fromBar") to #d2
hs-tokenize-full-expressions > put "Clicked" into my.innerHTML
hs-tokenize-full-expressions > set #d1.innerHTML to foo
hs-tokenize-full-expressions > toggle between .foo and .bar
hs-tokenize-full-expressions > wait 100ms then add .done
hs-upstream-add > can add a value to a set
hs-upstream-add > can add to an HTMLCollection
hs-upstream-add > can add to children
hs-upstream-add > can add to query in me
hs-upstream-add > supports async expressions in when clause
hs-upstream-append > append to undefined ignores the undefined
hs-upstream-append > can append a value to a DOM node
hs-upstream-append > can append a value to a set
hs-upstream-append > can append a value to I
hs-upstream-append > multiple appends work
hs-upstream-append > new DOM content added by append will be live
hs-upstream-askAnswer > confirm returns first choice on OK
hs-upstream-askAnswer > prompts and puts result in it
hs-upstream-call > call functions that return promises are waited on
hs-upstream-core/asyncError > rejected promise stops execution
hs-upstream-core/asyncError > rejected promise triggers catch block
hs-upstream-core/regressions > can invoke functions w/ numbers in name
hs-upstream-core/regressions > can pick detail fields out by name
hs-upstream-core/regressions > can refer to function in init blocks
hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly
hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly w/ of
hs-upstream-core/runtimeErrors > reports basic function invocation null errors properly w/ possessives
hs-upstream-core/runtimeErrors > reports null errors on add command properly
hs-upstream-core/runtimeErrors > reports null errors on decrement command properly
hs-upstream-core/runtimeErrors > reports null errors on default command properly
hs-upstream-core/runtimeErrors > reports null errors on hide command properly
hs-upstream-core/runtimeErrors > reports null errors on increment command properly
hs-upstream-core/runtimeErrors > reports null errors on measure command properly
hs-upstream-core/runtimeErrors > reports null errors on put command properly
hs-upstream-core/runtimeErrors > reports null errors on remove command properly
hs-upstream-core/runtimeErrors > reports null errors on send command properly
hs-upstream-core/runtimeErrors > reports null errors on sets properly
hs-upstream-core/runtimeErrors > reports null errors on settle command properly
hs-upstream-core/runtimeErrors > reports null errors on show command properly
hs-upstream-core/runtimeErrors > reports null errors on toggle command properly
hs-upstream-core/runtimeErrors > reports null errors on transition command properly
hs-upstream-core/runtimeErrors > reports null errors on trigger command properly
hs-upstream-core/runtime > has proper stack from event handler
hs-upstream-core/scoping > locally scoped variables don't clash with built-in variables
hs-upstream-empty > can empty a map
hs-upstream-empty > can empty an element
hs-upstream-empty > can empty a set
hs-upstream-empty > clear works on elements
hs-upstream-expressions/asExpression > can accept custom dynamic conversions
hs-upstream-expressions/asExpression > can use the a modifier if you like
hs-upstream-expressions/asExpression > collects duplicate text inputs into an array
hs-upstream-expressions/asExpression > converts a complete form into Values
hs-upstream-expressions/asExpression > converts a form element into Values
hs-upstream-expressions/asExpression > converts a form element into Values | FormEncoded
hs-upstream-expressions/asExpression > converts a form element into Values | JSONString
hs-upstream-expressions/asExpression > converts an element into HTML
hs-upstream-expressions/asExpression > converts a NodeList into HTML
hs-upstream-expressions/asExpression > converts array as Set
hs-upstream-expressions/asExpression > converts checkboxes into a Value correctly
hs-upstream-expressions/asExpression > converts multiple selects into a Value correctly
hs-upstream-expressions/asExpression > converts multiple selects with programmatically changed selections
hs-upstream-expressions/asExpression > converts object as Map
hs-upstream-expressions/asExpression > converts radio buttons into a Value correctly
hs-upstream-expressions/asExpression > converts value as Date
hs-upstream-expressions/asExpression > parses string as JSON to object
hs-upstream-expressions/asExpression > pipe operator chains conversions
hs-upstream-expressions/blockLiteral > basic block literals work
hs-upstream-expressions/blockLiteral > basic identity works
hs-upstream-expressions/blockLiteral > basic two arg identity works
hs-upstream-expressions/closest > closest does not consume a following where clause
hs-upstream-expressions/comparisonOperator > does not exist works
hs-upstream-expressions/cookies > basic clear cookie values work
hs-upstream-expressions/cookies > basic set cookie values work
hs-upstream-expressions/cookies > iterate cookies values work
hs-upstream-expressions/cookies > length is 0 when no cookies are set
hs-upstream-expressions/cookies > update cookie values work
hs-upstream-expressions/functionCalls > can access a property of a call's result
hs-upstream-expressions/functionCalls > can chain calls on the result of a call
hs-upstream-expressions/functionCalls > can invoke function on object
hs-upstream-expressions/functionCalls > can invoke function on object w/ async arg
hs-upstream-expressions/functionCalls > can invoke function on object w/ async root & arg
hs-upstream-expressions/functionCalls > can invoke global function
hs-upstream-expressions/functionCalls > can invoke global function w/ async arg
hs-upstream-expressions/functionCalls > can pass an array literal as an argument
hs-upstream-expressions/functionCalls > can pass an expression as an argument
hs-upstream-expressions/functionCalls > can pass an object literal as an argument
hs-upstream-expressions/functionCalls > can pass no arguments
hs-upstream-expressions/logicalOperator > and short-circuits when lhs promise resolves to false
hs-upstream-expressions/logicalOperator > should short circuit with and expression
hs-upstream-expressions/logicalOperator > should short circuit with or expression
hs-upstream-expressions/mathOperator > can use mixed expressions
hs-upstream-expressions/objectLiteral > expressions work in object literal field names
hs-upstream-expressions/propertyAccess > property access on function result
hs-upstream-expressions/some > some returns true for nonempty selector
hs-upstream-expressions/strings > string templates work w/ props
hs-upstream-expressions/strings > string templates work w/ props w/ braces
hs-upstream-expressions/symbol > resolves global context properly
hs-upstream-fetch > allows the event handler to change the fetch parameters
hs-upstream-fetch > as response does not throw on 404
hs-upstream-fetch > can catch an error that occurs when using fetch
hs-upstream-fetch > can do a simple fetch
hs-upstream-fetch > can do a simple fetch w/ a custom conversion
hs-upstream-fetch > can do a simple fetch w/ a naked URL
hs-upstream-fetch > can do a simple fetch w/ html
hs-upstream-fetch > can do a simple fetch w/ json
hs-upstream-fetch > can do a simple fetch w/ json using JSON syntax
hs-upstream-fetch > can do a simple fetch w/ json using Object syntax
hs-upstream-fetch > can do a simple fetch w/ json using Object syntax and an 'an' prefix
hs-upstream-fetch > can do a simple post
hs-upstream-fetch > can do a simple post alt syntax w/ curlies
hs-upstream-fetch > can do a simple post alt syntax without curlies
hs-upstream-fetch > can put response conversion after with
hs-upstream-fetch > can put response conversion before with
hs-upstream-fetch > do not throw passes through 404 response
hs-upstream-fetch > don't throw passes through 404 response
hs-upstream-fetch > Response can be converted to JSON via as JSON
hs-upstream-fetch > submits the fetch parameters to the event handler
hs-upstream-fetch > throws on non-2xx response by default
hs-upstream-fetch > triggers an event just before fetching
hs-upstream-hide > can hide element, with display:none by default
hs-upstream-hide > can hide element with display:none explicitly
hs-upstream-hide > can hide element with no target followed by command
hs-upstream-hide > can hide element with no target followed by then
hs-upstream-hide > can hide element with no target with a with
hs-upstream-hide > can hide element with opacity:0
hs-upstream-hide > can hide element with opacity style literal
hs-upstream-hide > can hide element, with visibility:hidden
hs-upstream-hide > can hide other elements
hs-upstream-if > if on new line does not join w/ else
hs-upstream-if > if properly supports nested if statements and end block
hs-upstream-js > can do both of the above
hs-upstream-js > can return values to _hyperscript
hs-upstream-js > handles rejected promises without hanging
hs-upstream-make > can make elements
hs-upstream-make > can make elements with id and classes
hs-upstream-make > can make named objects
hs-upstream-make > can make named objects w/ global scope
hs-upstream-make > can make named objects with arguments
hs-upstream-make > can make objects
hs-upstream-make > can make objects with arguments
hs-upstream-make > creates a div by default
hs-upstream-on > can catch exceptions thrown in hyperscript functions
hs-upstream-on > can catch exceptions thrown in js functions
hs-upstream-on > can ignore when target doesn't exist
hs-upstream-on > can pick detail fields out by name
hs-upstream-on > can pick event properties out by name
hs-upstream-on > listeners on other elements are removed when the registering element is removed
hs-upstream-on > multiple event handlers at a time are allowed to execute with the every keyword
hs-upstream-on > on intersection fires when the element is in the viewport
hs-upstream-on > rethrown exceptions trigger 'exception' event
hs-upstream-on > throttled at <time> drops events within the window
hs-upstream-put > waits on promises
hs-upstream-remove > can remove a value from a set
hs-upstream-repeat > can nest loops
hs-upstream-repeat > only executes the init expression once
hs-upstream-repeat > repeat forever works
hs-upstream-repeat > repeat forever works w/o keyword
hs-upstream-repeat > until keyword works
hs-upstream-repeat > while keyword works
hs-upstream-reset > can reset a textarea
hs-upstream-resize > fires when element is resized
hs-upstream-resize > on resize from window uses native window resize event
hs-upstream-resize > provides height in detail
hs-upstream-resize > works with from clause
hs-upstream-select > returns selected text
hs-upstream-send > can send events to any expression
hs-upstream-set > set waits on promises
hs-upstream-show > can filter over a set of elements using the its symbol
hs-upstream-socket > converts relative URL to ws:// on http pages
hs-upstream-socket > converts relative URL to wss:// on https pages
hs-upstream-socket > dispatchEvent sends JSON-encoded event over the socket
hs-upstream-socket > namespaced sockets work
hs-upstream-socket > on message as JSON handler decodes JSON payload
hs-upstream-socket > on message as JSON throws on non-JSON payload
hs-upstream-socket > on message handler fires on incoming text message
hs-upstream-socket > parses socket with absolute ws:// URL
hs-upstream-socket > rpc proxy blacklists then/catch/length/toJSON
hs-upstream-socket > rpc proxy default timeout rejects the promise
hs-upstream-socket > rpc proxy noTimeout avoids timeout rejection
hs-upstream-socket > rpc proxy reply with throw rejects the promise
hs-upstream-socket > rpc proxy sends a message and resolves the reply
hs-upstream-socket > rpc proxy timeout(n) rejects after a custom window
hs-upstream-socket > rpc reconnects after the underlying socket closes
hs-upstream-socket > with timeout parses and uses the configured timeout
hs-upstream-swap > can swap a variable with a property
hs-upstream-tell > works with an array
hs-upstream-tell > your symbol represents the thing being told
hs-upstream-toggle > can toggle display
hs-upstream-toggle > can toggle display on other elt
hs-upstream-toggle > can toggle display w/ my
hs-upstream-toggle > can toggle opacity
hs-upstream-toggle > can toggle opacity on other elt
hs-upstream-toggle > can toggle opacity w/ my
hs-upstream-toggle > can toggle until an event on another element
hs-upstream-toggle > can toggle visibility
hs-upstream-toggle > can toggle visibility on other elt
hs-upstream-toggle > can toggle visibility w/ my
hs-upstream-wait > can destructure properties in a wait
hs-upstream-wait > can wait on event
hs-upstream-wait > can wait on event on another element
hs-upstream-wait > waiting on an event sets 'it' to the event
hs-upstream-when > attribute observers are persistent (not recreated on re-run)
> string->number