datalog: quoted 'atoms' tokenize as strings
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 23s
Some checks failed
Test, Build, and Deploy / test-build-deploy (push) Failing after 23s
Quoted atoms with uppercase- or underscore-leading names were
misclassified as variables. `p('Hello World').` flowed through the
tokenizer's "atom" branch and through the parser's string->symbol,
producing a symbol named "Hello World". dl-var? inspects the first
character — "H" is uppercase, so the fact was rejected as non-ground
("expected ground literal").
Tokenizer now emits "string" for any '...' quoted form. Quoted atoms
become opaque string constants — matching how Datalog idiomatically
treats them, and avoiding a per-symbol "quoted" marker that would
have rippled through unification and dl-var?. The trade-off is that
'a' and a are no longer the same value (string vs symbol); for
Datalog this is the safer default.
Updated the existing "quoted atom" tokenize test, added a regression
case for an uppercase-named quoted atom, and a parse-level test that
verifies the AST. Conformance 269/269.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,10 +2,13 @@
|
||||
;;
|
||||
;; Tokens: {:type T :value V :pos P}
|
||||
;; Types:
|
||||
;; "atom" — lowercase-start ident or quoted 'atom'
|
||||
;; "atom" — lowercase-start bare identifier
|
||||
;; "var" — uppercase-start or _-start ident (value is the name)
|
||||
;; "number" — numeric literal (decoded to number)
|
||||
;; "string" — "..." string literal
|
||||
;; "string" — "..." string literal OR quoted 'atom' (treated as a
|
||||
;; string value to avoid the var-vs-atom ambiguity that
|
||||
;; would arise from a quoted atom whose name starts with
|
||||
;; an uppercase letter or underscore)
|
||||
;; "punct" — ( ) , .
|
||||
;; "op" — :- ?- <= >= != < > = + - * /
|
||||
;; "eof"
|
||||
@@ -192,7 +195,11 @@
|
||||
(dl-emit! "number" (read-number start) start)
|
||||
(scan!)))
|
||||
((= ch "'")
|
||||
(do (dl-emit! "atom" (read-quoted "'") start) (scan!)))
|
||||
;; Quoted 'atoms' tokenize as strings so a name
|
||||
;; like 'Hello World' doesn't get misclassified
|
||||
;; as a variable by dl-var? (which inspects the
|
||||
;; symbol's first character).
|
||||
(do (dl-emit! "string" (read-quoted "'") start) (scan!)))
|
||||
((= ch "\"")
|
||||
(do (dl-emit! "string" (read-quoted "\"") start) (scan!)))
|
||||
((dl-lower? ch)
|
||||
|
||||
Reference in New Issue
Block a user