1 Introduction
Regular expression (regexp or regex) is a sequence of characters that define a search pattern, comes from formal language theory. and supported by IEEE Posix with sets BRE(Basic Regular Expression) and ERE(Extended Regular Expression).
2 Posix metacharacters
2.1 Basic regular expression
| MetaCh_BRE | Description |
|---|---|
| ^S | matches the string start with S (of any line) |
| S$ | matches the string end with S (of any line) |
| [S] | bracket expression. Matches a single character in the scope of S ([S1-S2], [S1-], [-S2], ...) |
| [^S] | complement of [S] |
| . | matches any single character, except '\n' line-break |
| S* | matches S zero to gigantic times |
| S{m, n} | matches S at least m and not more than n times |
| (exp) | sub-expression |
| \n | n $\subseteq$ [1, 9], refer to which sub-expression |
2.2 Extended regular expression
| MetaCh_ERE | Description |
|---|---|
| (exp) | same as (exp) |
| S{m, n} | same as S{m, n} |
| S? | matches S zero or one time |
| + | matches S one to gigantic times |
| S1|S2 | optional S1 or S2 |
Tips : Because of its expressive power and (relative) ease of reading, many other utilities and programming languages have adopted syntax similar to Perl's — for example, Java, JavaScript, Julia, Python, Ruby, Qt, Microsoft's .NET Framework, and XML Schema, PHP, awk, sed, etc.
3 Character classes
The character class is a quick way to express the expressions set.
| POSIX | Perl | Vim | ASCII | Description |
|---|---|---|---|---|
| [:alnum:] | - | - | [A-Za-z0-9] | Alphanumeric characters |
| [:alpha:] | - | \a | [A-Za-z] | Alphabetic characters |
| [:blank:] | - | \s | [ \t] | Space and tab |
| [:cntrl:] | - | - | [\x00-\x1F\x7F] | Control characters |
| [:digit:] | \d | \d | [0-9] | Digits |
| [:lower:] | - | \l | [a-z] | Lowercase letters |
| [:upper:] | - | \u | [A-Z] | Uppercase letters |
| [:print:] | - | \p | [\x20-\x7E] | Visible characters and the space character |
| [:space:] | \s | _s | [ \t\r\n\v\f] | Whitespace characters |
| [:xdigit:] | - | \x | [A-Fa-f0-9] | Hexadecimal digits |
3 Lazy and possessive matching
4 Unicode
5 Wildcard Vs RegExp
Wildcards are placeholders while regular expressions are searching pattern, and only '.' in regexp is a wildcard.
| MetaCh | Description |
|---|---|
| * | placeholder for any characters, zero - gigantic |
| ? | placeholder for any single character |
Tips : some implementations also contain meta-ch similar to regexp, such as [S], [!S] (which equal to [^S]).