Lexical structure

Tokens

Each source file is divided into tokens, starting from the beginning of the file.

Identifiers (id) are case sensitive. Some of them are reserved; see section Reserved words below.

id :: alpha alphanum+
alpha :: "a".."z" | "A".."Z" | "_"
alphanum :: alpha | digit
digit :: "0".."9"

Numeric literals (int and float) are entered in base 10. Floating point literals can optionally have a fractional part, separated with a dot, and an integer exponent, separated with the letter e. If the exponent is present, the numeric value before the exponent is multiplied by 10**e, where e is the numeric value of the exponent.

int :: digit+
float :: digit+ exponent | digit* "." digit+ [ exponent ]
exponent :: ("e" | "E") ["+" | "-"] digit+

String literals (str) are entered within double quotes. The surrounding quotes are not part of the string value. Literal double quotes can be entered in string literals by duplicating them.

str :: <"> (<any character except ", CR or LF> | <"> <">)* <">

Various non-alphanumeric operator and punctuator tokens are defined:

opsym :: "+" | "-" | "*" | "/" | "**" | ":" | "==" | "!=" | "<" | ">" | ">=" | "<="
punct :: "(" | ")" | "[" | "]" | "," | "=" | "+=" | "-=" | "*=" | "/=" | "**=" | "::"

Newlines and semicolons can be used as statement separators (br). They are interchangeable. Repeated statement separators behave identically to a single statement separator.

br :: (newline | ";")+
newline :: <CR> <LF> | <LF> | <CR>

Whitespace and comments are ignored before and after tokens. Whitespace characters are optional, except between a token ending with an alphanumeric character and another token starting with an alphanumeric character, in which case they are required. Finally, there must be no whitespace characters before the initial-comment and utf8-bom tokens.

whitespace :: " " | <TAB>
comment :: "--" <any character except CR or LF>*

An initial source line starting with #! is interpreted as a comment:

initial-comment :: "#!" <any character except CR or LF>*

The special utf8-bom token may be present at the start of UTF-8 encoded files:

utf8-bom :: <EF> <BB> <BF>

Joining lines

Newlines after the following tokens are interpreted as whitespace, not as statement separators:

+ - * / ** div mod and or : to is == != < <= > >= ( [ = += -= *= /= **= ,

This can be used to divide long lines into multiple shorter lines.

Reserved words

The following words are reserved and cannot be used as identifiers (i.e. as names of global or local definitions, as member names or as module name components):

and      for      repeat
break    if       return 
case     import   self
class    in       sub  
const    is       super 
div      mod      switch 
elif     module   to  
else     nil      try  
encoding not      until  
end      or       var  
except   private  while
finally  raise         

Restricted names

Module name components and names of global definitions starting with two underscores (__) are reserved for internal use by the implementation. The implementation may freely define such names for any purpose, but user programs should not depend on their presence or absence to remain portable with different Alore implementations.

Additionally, it is recommended that the following names not be used as the first component of a module name, since they are reserved for use in future releases of Alore:

compiler
crypt
email
ftp
ftpserver
httpserver
locale
postgres
process
queue
serialize
smtp
sqlite
ssl
stack
timezone
udp
unicode
xml
xmltree

It is likely that some of these names will never be used in any future Alore release. Future Alore releases may remove some names from this list; these changes are retroactively applied to all earlier Alore versions as well.

Encoding

Alore source files may be encoded in ASCII, UTF-8 or ISO-8859-1 (Latin 1). See section Encoding declaration for information on specifying the encoding of a source file.

All 7-bit character codes except CR and LF (10 and 13, respectively) can be used in comments and string literals, including null characters, independent of the source file encoding. Double quotes, however, must be doubled within string literals.

In an ISO-8859-1 encoded source file, all character codes in range from 128 to 255, inclusive, can be used in comments and string literals. Similarly in a UTF-8 encoded source file, all valid UTF-8 sequences for code points between 128 and 65535, inclusive, can be used in comments and string literals.