@head @module re @title re: Regular expressions
This module provides operations for matching and manipulating strings using regular expressions. Regular expressions may be represented as strings or instances of the @ref{RegExp} class.
Examples: @example if Match("foo|bar", "bar ") != nil -- Succeeds WriteLn("Match!") end var r = RegExp("a*b", IgnoreCase) var m = Search(r, "... AaaB ...") m.group(0) -- "AaaB" @end @note Use the @ref{RegExp} class for case insensitive matching. @end
If new is a string, it acts as a template for the replacement string. \0 in the new string is replaced with the string matched by the regular expression and \n, where n is a positive integer, is replaced with the string matched by the group n of the regular expression. A backslash not followed by a digit is replaced with the character following the backslash. Finally, the rest of the string is included literally in the replacement string. Example: @example Subst("foo fox", "fo+", "<\0>") -- Result: "<foo> <fo>x" @end
New can also be a callable object
(of type def (MatchResult) as Str). In this case, call the
object with the
corresponding match object as the argument for
each occurrence of the regular expression in the string. The object
should return the replacement string when called. Example:
@example
Subst("cat sits on a table", "cat|table", def (m)
return m.group(0).upper()
end)
-- Result: "CAT sits on a TABLE"
@end
@end
@fun Split(str as Str, regexp as Str) as Array Constructing a RegExp instance or passing a Str object
as a regular expression may be costly, since the regular expression must be
compiled into an internal form before matching.
Using a single RegExp instance for multiple operations is often
faster, since this allows reusing the compiled form.
@end
@end-class
Match result objects are instances of the MatchResult class.
MatchResult should not be used to construct match result objects
directly. Match result objects have these methods:
@fun group(n as Int) as Str
@desc Return the substring matched by a specific group. The group 0 is the
substring matched by the entire regular expression. Return nil
if the group exists, but it is within a part of the regular expression
that was not matched.
@end
@fun start(n as Int) as Int
@desc Return the start index of the substring matched by a specific group.
The group 0 refers to the
substring matched by the entire regular expressions. Return
-1 if the group exists in the regular expression, but it is
within a part of the regular expression that was not matched.
If a group matched an empty string, the start index is equal to the
stop index.
@end
@fun stop(n as Int) as Int
@desc Return the stop index of the substring matched by a specific group.
The group 0 refers to the
substring matched by the entire regular expressions. Return
-1 if the group exists in the regular expression, but it is
within a part of the regular expression that was not matched.
If a group matched an empty string, the span start index is equal to the
stop index.
@end
@fun span(n as Int) as Pair Any character that does not have any other significance is a regular
expression that matches itself, i.e. "x" matches the letter "x" and so on.
Additionally, regular expressions can be constructed by following the rules
below (a and b may refer to any regular expression):
Finally, combining backslash ("\") and another character or characters forms
special regular expressions. If the backslash sequence is not special, a
backslash followed by any character matches the following character. These are
the special backslash sequences:
Class RegExp
@class RegExp(regexp as Str, ... as Constant)
@desc Construct a regular expression object. The first parameter
must be a regular expression string. Optionally, the IgnoreCase
constant can be given as an additional parameter to enable case insensitive
matching. RegExp instances can be used instead of Str
objects whenever regular expressions are expected.
Constants
@var IgnoreCase as Constant
@desc Flag for case insensitive matching, passed to @ref{RegExp}.
@end
@class-hidden MatchResult
MatchResult objects
Exceptions
@class RegExpError
@desc Raised when one of the operations in this module is passed an invalid
regular expression string. Inherits from @ref{std::ValueError}.
@end
Regular expression syntax overview
.
Match any single character.
^
Anchor match at the beginning of a string.
$
Anchor match at the end of a string.
ab
Match a followed by b.
a*
Match a repeated 0+ times.
a+
Match a repeated 1+ times.
a?
Optionally match a.
a|b
Match a or b.
a{n}
Match a exactly n times.
a{n,}
Match a at least n times.
a{n,m}
Match a at least n and at most m times.
(a)
Match a. Use parentheses to group expressions. Each regular
expression within parentheses is a group. Groups within a
regular expression are numbered so that the leftmost parenthesized
expression (the one where the index of the "(" character is smallest)
is the group 1, the next is the group 2, etc.
[...]
Match any character inside the brackets. If the brackets contain
character ranges of the form x-y, each such range matches any
character in the range from x to y.
Backslash sequences have the same behavior as described below, unless
otherwise noted.
[^...]
Match the inverse of the corresponding [...] expression.
@see
@link re-details.html
contains additional information about regular expression matching.
@end
\n
Match the string matched by a parenthesized group (back reference) or
an octal character code if n is an integer.
\xnn
Match hexadecimal character code. nn must be an integer in
hexadecimal.
\<
Match the empty string at the beginning of a word.
\>
Match the empty string at the end of a word.
\a
Match ASCII bell.
\b
Match ASCII backspace.
\d
Match any decimal digit [0-9].
\D
Match any character except decimal digit [^0-9].
\f
Match ASCII form feed.
\n
Match ASCII linefeed.
\r
Match ASCII carriage return.
\s
Match any whitespace character.
\S
Match any non-whitespace character.
\t
Match ASCII horizontal tab.
\v
Match ASCII vertical tab.
\w
Match any alphanumeric character or the underscore ("_").
\W
Match any non-alphanumeric and non-underscore character.