Files
rose-ash/lib/guest/lex.sx
giles 559b0df900 GUEST: step 3 — lib/guest/lex.sx character-class + token primitives
Extracted shared tokeniser primitives:
- Char-class predicates: lex-digit?, lex-hex-digit?, lex-alpha?
  (alias lex-letter?), lex-alnum?, lex-ident-start?, lex-ident-char?,
  lex-space? (no newline), lex-whitespace? (incl newline). All nil-safe.
- Token record: lex-make-token, lex-make-token-spanning, accessors.

Ported lib/lua/tokenizer.sx and lib/tcl/tokenizer.sx — 7 lua and 5 tcl
predicate definitions collapsed into prefix-rename calls that alias
lua-/tcl- names to lex- primitives. Test scripts (lua/test.sh,
tcl/test.sh, tcl/conformance.sh) load lib/guest/lex.sx and prefix.sx
before the per-language tokenizer.

Verification:
- lua/test.sh: 185/185 = baseline
- tcl/test.sh: 342/342 (parse 67 + eval 169 + error 39 + namespace 22
                       + coro 20 + idiom 25)
- tcl/conformance.sh: 3/4 = baseline (event-loop failure is pre-existing)

Two consumers verified — step complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:06:12 +00:00

68 lines
2.1 KiB
Plaintext

;; lib/guest/lex.sx — character-class predicates and token primitives shared
;; across guest tokenisers.
;;
;; All predicates are nil-safe — they accept nil (end-of-input) and return
;; false. This matches the convention used by the existing per-language
;; tokenisers (cur returns nil at EOF).
;;
;; Char classes
;; ------------
;; lex-digit? — 0-9
;; lex-hex-digit? — 0-9, a-f, A-F
;; lex-alpha? — a-z, A-Z (alias: lex-letter?)
;; lex-alnum? — alpha or digit
;; lex-ident-start? — alpha or underscore
;; lex-ident-char? — ident-start or digit
;; lex-space? — " ", "\t", "\r" (no newline)
;; lex-whitespace? — " ", "\t", "\r", "\n" (includes newline)
;;
;; Token record
;; ------------
;; (lex-make-token TYPE VALUE POS) — {:type :value :pos}
;; (lex-make-token-spanning TYPE VALUE POS END)
;; — {:type :value :pos :end}
;; (lex-token-type TOK)
;; (lex-token-value TOK)
;; (lex-token-pos TOK)
(define lex-digit? (fn (c) (and (not (= c nil)) (>= c "0") (<= c "9"))))
(define
lex-hex-digit?
(fn
(c)
(and
(not (= c nil))
(or
(lex-digit? c)
(and (>= c "a") (<= c "f"))
(and (>= c "A") (<= c "F"))))))
(define
lex-alpha?
(fn
(c)
(and
(not (= c nil))
(or (and (>= c "a") (<= c "z")) (and (>= c "A") (<= c "Z"))))))
(define lex-letter? lex-alpha?)
(define lex-alnum? (fn (c) (or (lex-alpha? c) (lex-digit? c))))
(define lex-ident-start? (fn (c) (or (lex-alpha? c) (= c "_"))))
(define lex-ident-char? (fn (c) (or (lex-ident-start? c) (lex-digit? c))))
(define lex-space? (fn (c) (or (= c " ") (= c "\t") (= c "\r"))))
(define lex-whitespace? (fn (c) (or (lex-space? c) (= c "\n"))))
(define lex-make-token (fn (type value pos) {:pos pos :value value :type type}))
(define lex-make-token-spanning (fn (type value pos end) {:pos pos :end end :value value :type type}))
(define lex-token-type (fn (tok) (get tok :type)))
(define lex-token-value (fn (tok) (get tok :value)))
(define lex-token-pos (fn (tok) (get tok :pos)))