Files
rose-ash/plans/smalltalk-on-sx.md
giles e71154f9c6
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Has been cancelled
smalltalk: chunk-stream parser + pragmas + 21 tests
2026-04-25 01:11:44 +00:00

8.8 KiB

Smalltalk-on-SX: blocks with non-local return on delimited continuations

The headline showcase is blocks — Smalltalk's closures with non-local return (^expr aborts the enclosing method, not the block). Every other Smalltalk on top of a host VM (RSqueak on PyPy, GemStone on C, Maxine on Java) reinvents non-local return on whatever stack discipline the host gives them. On SX it's a one-liner: a block holds a captured continuation; ^ just invokes it. Message-passing OO falls out cheaply on top of the existing component / dispatch machinery.

End-state goal: ANSI-ish Smalltalk-80 subset, SUnit working, ~200 hand-written tests + a vendored slice of the Pharo kernel tests, classic corpus (eight queens, quicksort, mandelbrot, Conway's Life).

Scope decisions (defaults — override by editing before we spawn)

  • Syntax: Pharo / Squeak chunk format (! separators, Object subclass: #Foo …). No fileIn/fileOut images — text source only.
  • Conformance: ANSI X3J20 as a target, not bug-for-bug Squeak. "Reads like Smalltalk, runs like Smalltalk."
  • Test corpus: SUnit ported to SX-Smalltalk + custom programs + a curated slice of Pharo Kernel-Tests / Collections-Tests.
  • Image: out of scope. Source-only. No become: between sessions, no snapshotting.
  • Reflection: class, respondsTo:, perform:, doesNotUnderstand: in. become: (object-identity swap) in — it's a good CEK exercise. Method modification at runtime in.
  • GUI / Morphic / threads: out entirely.

Ground rules

  • Scope: only touch lib/smalltalk/** and plans/smalltalk-on-sx.md. Don't edit spec/, hosts/, shared/, or any other lib/<lang>/**. Smalltalk primitives go in lib/smalltalk/runtime.sx.
  • SX files: use sx-tree MCP tools only.
  • Commits: one feature per commit. Keep ## Progress log updated and tick roadmap boxes.

Architecture sketch

Smalltalk source
    │
    ▼
lib/smalltalk/tokenizer.sx  — selectors, keywords, literals, $c, #sym, #(…), $'…'
    │
    ▼
lib/smalltalk/parser.sx     — AST: classes, methods, blocks, cascades, sends
    │
    ▼
lib/smalltalk/transpile.sx  — AST → SX AST (entry: smalltalk-eval-ast)
    │
    ▼
lib/smalltalk/runtime.sx    — class table, MOP, dispatch, primitives

Core mapping:

  • Class = SX dict {:name :superclass :ivars :methods :class-methods :metaclass}. Class table is a flat dict keyed by class name.
  • Object = SX dict {:class :ivars}ivars keyed by symbol. Tagged ints / floats / strings / symbols are not boxed; their class is looked up by SX type.
  • Method = SX lambda closing over a self binding + temps. Body wrapped in a delimited continuation so ^ can escape.
  • Message send = (st-send receiver selector args) — does class-table lookup, walks superclass chain, falls back to doesNotUnderstand: with a Message object.
  • Block [:x | … ^v … ] = lambda + captured ^k (the method-return continuation). Invoking ^ calls k; outer block invocation past method return raises BlockContext>>cannotReturn:.
  • Cascade r m1; m2; m3 = (let ((tmp r)) (st-send tmp 'm1 ()) (st-send tmp 'm2 ()) (st-send tmp 'm3 ())).
  • ifTrue:ifFalse: / whileTrue: = ordinary block sends; the runtime intrinsifies them in the JIT path so they compile to native branches (Tier 1 of bytecode expansion already covers this pattern).
  • become: = swap two object identities everywhere — in SX this is a heap walk, but we restrict to oneWayBecome: (cheap: rewrite class field) by default.

Roadmap

Phase 1 — tokenizer + parser

  • Tokenizer: identifiers, keywords (foo:), binary selectors (+, ==, ,, ->, ~= etc.), numbers (radix 16r1F; scaled 1.5s2 deferred), strings '…''…', characters $c, symbols #foo #'foo bar' #+, byte arrays #[1 2 3] (open token), literal arrays #(1 #foo 'x') (open token), comments "…"
  • Parser (expression level): blocks [:a :b | | t1 t2 | …], cascades, message precedence (unary > binary > keyword), assignment, return, statement sequences, literal arrays, byte arrays, paren grouping, method headers (+ other, at:put:, unary, with temps and body). Class-definition keyword messages parse as ordinary keyword sends — no special-case needed.
  • Parser (chunk-stream level): st-read-chunks splits source on ! (with !! doubling) and st-parse-chunks runs the Pharo file-in state machine — methodsFor: / class methodsFor: opens a method batch, an empty chunk closes it. Pragmas <primitive: …> (incl. multiple keyword pairs, before or after temps, multiple per method) parsed into the method AST.
  • Unit tests in lib/smalltalk/tests/parse.sx

Phase 2 — object model + sequential eval

  • Class table + bootstrap: Object, Behavior, Class, Metaclass, UndefinedObject, Boolean/True/False, Number/Integer/Float, String, Symbol, Array, Block
  • smalltalk-eval-ast: literals, variable reference, assignment, message send, cascade, sequence, return
  • Method lookup: walk class → superclass; cache hit-class on (class, selector)
  • doesNotUnderstand: fallback constructing Message object
  • super send (lookup starts at superclass of defining class, not receiver class)
  • 30+ tests in lib/smalltalk/tests/eval.sx

Phase 3 — blocks + non-local return (THE SHOWCASE)

  • Method invocation captures a ^k (the return continuation) and binds it as the block's escape
  • ^expr from inside a block invokes that captured ^k
  • BlockContext>>value, value:, value:value:, …, valueWithArguments:
  • whileTrue: / whileTrue / whileFalse: / whileFalse as ordinary block sends — runtime intrinsifies the loop in the bytecode JIT
  • ifTrue: / ifFalse: / ifTrue:ifFalse: as block sends, similarly intrinsified
  • Escape past returned-from method raises BlockContext>>cannotReturn:
  • Classic programs in lib/smalltalk/tests/programs/:
    • eight-queens.st
    • quicksort.st
    • mandelbrot.st
    • life.st (Conway's Life, glider gun)
    • fibonacci.st (recursive + memoised)
  • lib/smalltalk/conformance.sh + runner, scoreboard.json + scoreboard.md

Phase 4 — reflection + MOP

  • Object>>class, class>>name, class>>superclass, class>>methodDict, class>>selectors
  • Object>>perform: / perform:with: / perform:withArguments:
  • Object>>respondsTo:, Object>>isKindOf:, Object>>isMemberOf:
  • Behavior>>compile: — runtime method addition
  • Object>>becomeForward: (one-way become; rewrites the class field of aReceiver)
  • Exceptions: Exception, Error, signal, signal:, on:do:, ensure:, ifCurtailed: — built on top of SX handler-bind/raise

Phase 5 — collections + numeric tower

  • SequenceableCollection/OrderedCollection/Array/String/Symbol
  • HashedCollection/Set/Dictionary/IdentityDictionary
  • Stream hierarchy: ReadStream/WriteStream/ReadWriteStream
  • Number tower: SmallInteger/LargePositiveInteger/Float/Fraction
  • String>>format:, printOn: for everything

Phase 6 — SUnit + corpus to 200+

  • Port SUnit (TestCase, TestSuite, TestResult) — written in SX-Smalltalk, runs in itself
  • Vendor a slice of Pharo Kernel-Tests and Collections-Tests
  • Drive the scoreboard up: aim for 200+ green tests
  • Stretch: ANSI Smalltalk validator subset

Phase 7 — speed (optional)

  • Method-dictionary inline caching (already in CEK as a primitive; just wire selector cache)
  • Block intrinsification beyond whileTrue: / ifTrue:
  • Compare against GNU Smalltalk on the corpus

Progress log

Newest first. Agent appends on every commit.

  • 2026-04-25: chunk-stream parser + pragmas + 21 chunk/pragma tests (lib/smalltalk/tests/parse_chunks.sx). st-read-chunks (with !! doubling), st-parse-chunks state machine for methodsFor: batches incl. class-side. Pragmas with multiple keyword pairs, signed numeric / string / symbol args, in either pragma-then-temps or temps-then-pragma order. 131/131 tests pass.
  • 2026-04-25: expression-level parser + 47 parse tests (lib/smalltalk/parser.sx, lib/smalltalk/tests/parse.sx). Full message precedence (unary > binary > keyword), cascades, blocks with params/temps, literal/byte arrays, assignment chain, method headers (unary/binary/keyword). Chunk-format ! ! driver deferred to a follow-up box. 110/110 tests pass.
  • 2026-04-25: tokenizer + 63 tests (lib/smalltalk/tokenizer.sx, lib/smalltalk/tests/tokenize.sx, lib/smalltalk/test.sh). All token types covered except scaled decimals 1.5s2 (deferred). #( and #[ emit open tokens; literal-array contents lexed as ordinary tokens for the parser to interpret.

Blockers

Shared-file issues that need someone else to fix. Minimal repro only.

  • (none yet)