datalog: quoted 'atoms' tokenize as strings
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 23s

Quoted atoms with uppercase- or underscore-leading names were
misclassified as variables. `p('Hello World').` flowed through the
tokenizer's "atom" branch and through the parser's string->symbol,
producing a symbol named "Hello World". dl-var? inspects the first
character — "H" is uppercase, so the fact was rejected as non-ground
("expected ground literal").

Tokenizer now emits "string" for any '...' quoted form. Quoted atoms
become opaque string constants — matching how Datalog idiomatically
treats them, and avoiding a per-symbol "quoted" marker that would
have rippled through unification and dl-var?. The trade-off is that
'a' and a are no longer the same value (string vs symbol); for
Datalog this is the safer default.

Updated the existing "quoted atom" tokenize test, added a regression
case for an uppercase-named quoted atom, and a parse-level test that
verifies the AST. Conformance 269/269.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-11 08:39:24 +00:00
parent 917ffe5ccc
commit 7a94a47e26
6 changed files with 49 additions and 14 deletions

View File

@@ -2,10 +2,13 @@
;;
;; Tokens: {:type T :value V :pos P}
;; Types:
;; "atom" — lowercase-start ident or quoted 'atom'
;; "atom" — lowercase-start bare identifier
;; "var" — uppercase-start or _-start ident (value is the name)
;; "number" — numeric literal (decoded to number)
;; "string" — "..." string literal
;; "string" — "..." string literal OR quoted 'atom' (treated as a
;; string value to avoid the var-vs-atom ambiguity that
;; would arise from a quoted atom whose name starts with
;; an uppercase letter or underscore)
;; "punct" — ( ) , .
;; "op" — :- ?- <= >= != < > = + - * /
;; "eof"
@@ -192,7 +195,11 @@
(dl-emit! "number" (read-number start) start)
(scan!)))
((= ch "'")
(do (dl-emit! "atom" (read-quoted "'") start) (scan!)))
;; Quoted 'atoms' tokenize as strings so a name
;; like 'Hello World' doesn't get misclassified
;; as a variable by dl-var? (which inspects the
;; symbol's first character).
(do (dl-emit! "string" (read-quoted "'") start) (scan!)))
((= ch "\"")
(do (dl-emit! "string" (read-quoted "\"") start) (scan!)))
((dl-lower? ch)