Lexical structure

Source

UTF-8. Source files have extension .ar. Non-ASCII text is admitted in identifiers (Identifiers, UAX #31), comments (Comments), and string/char literals (Literals). Operators and keywords have a canonical ASCII spelling and an equivalent Unicode form — Lean-style dual notation (Operators): ∀ is forall, ⊑ is <:, → is ->, ⊞ is box_plus. The two are interchangeable surface spellings of the same token, so a program may be written in either (or a mix), and editors enter the Unicode forms through a \abbrev input method (\forall → ∀). Delimiters and the remaining punctuation are ASCII.

Comments

// — line comment, not extracted by ox doc.
/// — outer doc comment, attaches to the next item.
//! — inner doc comment, attaches to the enclosing module.
/* … */ — block comment, may be nested. Not extracted by ox doc. Comments out arbitrary tokens including //-prefixed lines, useful when temporarily disabling sub-expressions.

Identifiers

Identifiers follow UAX #31: an identifier starts with a XID_Start character or _, and continues with XID_Continue characters. The ASCII subset [A-Za-z_][A-Za-z0-9_]* is the common case and the recommended style. An identifier’s text is its exact source bytes. NFC canonical equivalence (so that a precomposed é U+00E9 and the decomposed e + combining acute U+0065 U+0301 would denote the same name) and a mixed-script confusable warning (UAX #39, Latin A U+0041 vs Cyrillic А U+0410) are the RFD 0034 name-resolution design (Out of scope). Conventions: snake_case for value bindings, fields, and rules (fn, derive, query, mutate, check); CamelCase for types, metatype-introduced concepts, and metarel-introduced relations; lowercase (typically snake_case) for the metatype- and metarel-introducing keywords themselves (kind, role, rel, mediation, …); SCREAMING_SNAKE_CASE for stdlib constants. Conventions are not enforced.

Operators

Token	Meaning
`:`	typing / instantiation (`x: T`, `Dog : AnimalSpecies`); metatype-body axis binding (`rigidity: rigid`)
`<:`	specialization, trait inheritance, trait bound (`⊑` alias)
`::`	qualification — path or `axis::value`
`:-`	Datalog rule-body separator
`=>`	rule head, match arm, check-rule diagnostic emission
`=`	value assignment, single-expression body, kind-level value
`?`	optional sugar (`T?` ≡ `Option<T>`); error propagation on `Result`
`..`, `..=`	range, cardinality
`->`	rule return type
`+`, `*`	arithmetic; transitive / reflexive-transitive closure on role paths
`-`, `/`	arithmetic
`==`, `!=`, `<`, `<=`, `>`, `>=`	comparison
`&&`, `\|\|`, `!`	boolean (expression context)
`,`	conjunction in Datalog body; tuple/list element separator
`;`	statement terminator
`\|`	variant separator
`#[…]`	attribute (compiler-known or procedural macro), outer form
`#![…]`	inner attribute, applies to enclosing item
`name!(…)`	declarative or procedural macro at expression position
`..` (prefix)	struct-update spread
`_`	wildcard / anonymous binding

Top / ⊤ and Bot / ⊥ are type-level constants.

Unicode notation

Each operator/keyword below has an equivalent Unicode glyph; the glyph and its ASCII spelling lex to the same token, so they are fully interchangeable and nothing past the lexer distinguishes them. Glyphs and the editor \abbrevs follow the Lean lean4-input conventions; the metric-modal glyphs (⊞/⊟) follow the DatalogMTL literature. The authoritative table is [[notation]] in compiler/crates/oxc-syntax/grammar.toml, the single source from which the lexer’s aliases and the editor input methods are generated.

Glyph	ASCII	`\abbrev`	Glyph	ASCII	`\abbrev`
`∀`	`forall`	`\forall`	`≤`	`<=`	`\le`
`∃`	`exists`	`\exists`	`≥`	`>=`	`\ge`
`↔`	`iff`	`\iff`	`≠`	`!=`	`\ne`
`¬`	`not`	`\not`	`→`	`->`	`\to`
`∧`	`&&`	`\and`	`←`	`:-`	`\l`
`∨`	`\|\|`	`\or`	`∈`	`in`	`\in`
`⊑`	`<:`	`\sqsubseteq`	`∪`	`union`	`\cup`
`⊤`	`Top`	`\top`	`∩`	`intersect`	`\cap`
`⊥`	`Bot`	`\bot`	`∖`	`except`	`\setminus`
`□`	`box`	`\box`	`⊞`	`box_plus`	`\boxplus`
`◇`	`diamond`	`\diamond`	`⊟`	`box_minus`	`\boxminus`

=> is deliberately not given a Unicode form: it is match-arm / head syntax, not implication, so ⇒ would mislead. diamond_plus / diamond_minus have no single conventional glyph (the notation is ◇ with a directional marker) and stay ASCII-only for now.

Boolean operator placement. Argon distinguishes two syntactic positions:

Rule-body position — after :- in derive/check, inside unsafe logic blocks, the from/where clauses of query bodies, subquery bodies (collect/set/one/count/exists), refinement where { … } / iff { … } clauses, and require { … } precondition blocks. Conjunction is ,; NAF is not atom. Disjunction is not inline — write multiple rules with the same head, or combine subquery variants via union / intersect / except.
Expression position — inside fn bodies, if / match conditions, let initializers, the RHS of comparison atoms inside rule bodies, and atom arguments. Boolean operators are &&, ||, !.

The two are non-overlapping: && is not accepted between rule-body atoms; , is not a Boolean operator in expressions.

Literals

INT      ::= [0-9]+ ( '_' [0-9]+ )*
REAL     ::= INT '.' INT
STRING   ::= '"' chars '"'
DATE     ::= '#' YYYY '-' MM '-' DD '#'
DATETIME ::= '#' YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z' '#'
BOOL     ::= 'true' | 'false'
LIST     ::= '[' (expr (',' expr)*)? ','? ']'
RECORD   ::= '{' (Ident ':' expr (',' …)*)? ','? '}'

Set/map literals are constructor calls (Set::new(a, b, c), Map::new((k, v), …)); there is no {1, 2, 3} form (Out of scope).

Keyboard shortcuts

Argon Reference