Notation

This document uses a modified BNF notation to describe grammar and lexical rules. Here are a few examples of grammar rules:

digit :: "0".."9"
num :: digit+ | digit* "." digit+

This means that the digit rule is defined as a numeric character and that the num rule is defined as either a string of at least one digit, or any number of digits followed by a dot and one or more digits.

Each rule starts with the name of the rule (digit and num in the previous example), followed by two colons (::). The colons are followed by a rule definition. In the rule definition, the name of any rule refers to that rule (the definition of the num rule in the above example refers to the digit rule). Literal characters and strings are entered within double quotes. The | symbol separates alternatives. The + operator specifies that the previous item can be repeated one or more times, whereas the * operator means that the previous item can be repeated any number of times or be omitted. Brackets [ and ] are used to denote optional items, and parentheses are used for grouping. Items can be put one after another to form sequences.

Sequences are matched from left to right. Each repeated item is matched greedily (as many times as possible) before the next item is matched. For example, consider this rule:

x :: y+ [ y ]

The first item (y+) in the definition always matches as many y's as possible, and the second item ([ y ]) can thus never match anything.

Some special characters or sets of characters are entered inside angle brackets, for example:

comment :: "--" <any character except CR or LF>*

This means that a comment contains two dashes followed by all the characters before the end of the current line.

Sequencing has the highest precedence, followed by repetition (+ and *). Alternation has the lowest precedence. Parentheses can be used to override operator precedence.