Vim Regex and Pattern Matching¶
See :help vim.re for regex through Nvim's Lua API.
Table of Contents¶
- Very Magic
- Vim Regex and Perl Regex
- Metacharacters (Escaped Characters) and Character Classes
- Tricks
- Including End-of-Line (EOL) and Start-of-Line (SOL) in Pattern Matches
- Matching Start-of-Line after Another Atom
- Overview of Multi Items
- Matching Literal Key Characters
- Dealing with Accents in Unicode
- Range of Operation
- Capture Groups and Backreferences with Substitutions and Other Pattern Commands
- Flags
- Substitutions with Expressions
- Quantifiers, Greedy and Non-Greedy
- Optionally Match Atoms
- Match Inside the Visual Area
- Match with the Cursor Position
- Using Marks for Matching
- Line Number Matching
- Matching with Start and End of the File
- Matching with Columns
- Matching After a Pattern
- Matching After a NON-matching pattern
- Match Excluding the Preceding Atom
- Zero-Width
- Setting the Start of a Match with
\zs - Setting the End of a Match with
\ze - Less Useful Patterns
- Match if Previous Pattern Doesn't Match at the CURRENT Position
- Match at the Current Position like a Single Pattern
- Matching Different Number Systems
- Matching Decimal, Octal, and Hexadecimal Number Systems
- Collections / Sets
- Good Ones to Remember
Very Magic¶
Using \v means that after it, all ASCII characters except 0-9, a-z,
A-Z and _ have special meaning: "very magic".
This means that none of the special characters need to be escaped.
Using \V ("very nomagic") means that they ALL need to be escaped.
\v |
\m |
\M |
\V |
Matches |
|---|---|---|---|---|
| 'v.Magic' | 'Magic' | 'NoMagic' | 'v.NoMagic' | |
a |
a |
a |
a |
Literal a |
\a |
\a |
\a |
\a |
Any alphabetic character |
. |
. |
\. |
\. |
Any character |
\. |
\. |
. |
. |
Literal dot |
$ |
$ |
$ |
\$ |
End-of-line |
* |
* |
\* |
\* |
Any number of the previous atom |
~ |
~ |
\~ |
\~ |
Latest substitute string |
() |
\(\) |
\(\) |
\(\) |
Group as an atom |
\| |
\\| |
\\| |
\\| |
Nothing: separates alternatives (logical OR) |
\\ |
\\ |
\\ |
\\ |
Literal backslash |
\{ |
{ |
{ |
{ |
Literal curly brace |
Vim Regex and Perl Regex¶
| Capability | Vim-speak | Perl-speak |
|---|---|---|
| force case insensitivity | \c |
(?i) |
| force case sensitivity | \C |
(?-i) |
| Non-capturing grouping | \%(atom\) |
(?:atom) |
| 0-width match | atom\@= |
(?=atom) |
| 0-width non-match | atom\@! |
(?!atom) |
| 0-width preceding match | atom\@<= |
(?<=atom) |
| 0-width preceding non-match | atom\@<! |
(?<!atom) |
| match without retry | atom\@> |
(?>atom) |
| conservative quantifiers | \{-n,m} |
*?, +?, ??, {}? |
-
Vim beginnings and ends:
- Vim's
^and$always match at embedded newlines, and you get two separate atoms. - With
\%^and\%$, you only match at the very start and end of the text.
- Vim's
-
Perl beginnings and ends:
- In Perl,
^and$only match at the beginning and end of the text by default.- But, you can set the
mflag, which lets them match at embedded newlines as well.
- But, you can set the
- In Perl,
Unique to Vim¶
-
Changing the magic-ness of a pattern:
\v\V\m\M(very useful for avoiding backslashitis) -
Sequence of optionally matching atoms:
\%[atoms] -
\&(which is to\|what "and" is to "or"; it forces several branches
to match at one spot) -
Matching lines/columns by number:
\%5l\%5c\%5v - Setting the start and end of the match:
\zs\ze
Unique to Perl¶
- Execution of arbitrary code in the regex:
(?{perl code}) - Conditional expressions:
(?(condition)true-expr|false-expr)
Important Help Files¶
:h pattern-overview:h ordinary-atom:h character-classes:syn-ext-match
Metacharacters (Escaped Characters) and Character Classes¶
:h character-classes¶
Whitespace:¶
| Character Class | Matches |
|---|---|
. |
Any character except new line |
\s |
Any whitespace character |
\S |
non-whitespace character |
Digits:¶
| Character Class | Matches |
|---|---|
\d |
digit |
\D |
non-digit |
\x |
hex digit |
\X |
non-hex digit |
\o |
octal digit |
\O |
non-octal digit |
\%d |
Decimal (base10) |
\%o |
Octal (base8) |
\%x |
Hexadecimal (base16) up to 2 hexadecimal characters |
\%u |
Hexadecimal (base16) up to 4 hexadecimal characters |
\%u |
Hexadecimal (base16) up to 8 hexadecimal characters |
NOTE: With
%\o, Octal numbers below0o40must be followed by a non-octal digit or a non-digit.¶
Letters:¶
| Character Class | Matches |
|---|---|
\h |
head of word character (a-z, A-Z and _) |
\H |
non-head of word character |
\p |
printable character |
\P |
like \p, but excluding digits |
\w |
word character |
\W |
non-word character |
\a |
alphabetic character |
\A |
non-alphabetic character |
\l |
lowercase character |
\L |
non-lowercase character |
\u |
uppercase character |
\U |
non-uppercase character |
Special Characters:¶
| Character Class | Matches |
|---|---|
\e |
matches <Esc> |
\t |
matches <Tab> |
\r |
matches <CR> |
\b |
matches <BS> |
\n |
matches an EOL (end-of-line) |
Substitution Special Characters¶
See :h :s\= (or :h sub-replace-special)
Related: :h s/\= (or :h sub-replace-expression)
| magic | nomagic | action |
|---|---|---|
& |
\& |
Replaced with the whole matched pattern |
\& |
& |
Replaced with & |
\0 |
Replaced with the whole matched pattern | |
\1 |
Replaced with the matched pattern in the first capture group () |
|
\2 |
Replaced with the matched pattern in the second capture group () |
|
... |
... (\3, \4,...) |
|
\9 |
Replaced with the matched pattern in the ninth capture group () |
|
~ |
\~ |
Replaced with the {string} of the previous substitute |
\~ |
~ |
Replaced with ~ |
\u |
Next character made uppercase | |
\U |
Following characters made uppercase, until \E | |
\l |
Next character made lowercase | |
\L |
Following characters made lowercase, until \E | |
\e |
End of \u, \U, \l and \L (NOTE: not <Esc>!) |
|
\E |
End of \u, \U, \l and \L | |
<CR> |
Split line in two at this point (Type the <CR> as CTRL-V <Enter>) |
|
\r |
Same as <CR>. Inserts a newline. |
|
\<CR> |
Insert a carriage-return (CTRL-M) (Type the <CR> as CTRL-V <Enter>) |
|
\n |
Insert a <NL> (<NUL> in the file) (does NOT break the line) |
|
\b |
Insert a <BS> |
|
\t |
Insert a <Tab> |
|
\\ |
Insert a single backslash | |
\x |
Where x is any character not mentioned above: Reserved for future expansion |
Tricks:¶
To avoid needing to escape forward slashes / in a substitution,
you can use a different seperator.
" Syntax:
:s:pattern:replacement:flags
" To replace all occurrences of "vi" with "vim"
:%s:\<vi\>:vim:g
\%[]: Optionally matches inside the collection/set[ ]
Note: inside the [ ] (collection), all metacharacters behave like ordinary characters.
- If you want to include
-(dash) in your range put it first:/[-0-9]/
- Same with
[:/[[0-9]
To avoid the need for escaping a lot of things (like capture groups), set the
very magic flag:
Ignoring Case in a Pattern¶
\c: will force the entire pattern to ignore case\C: will enforce case-sensitive matching for the whole pattern
Including End-of-Line (EOL) and Start-of-Line (SOL) in Pattern Matches¶
Matching a Character Class and End of Line¶
Adding an underscore _ between the backslash and character
for a character class will make it also include end-of-line.
For example:
Matching Start-of-Line after Another Atom¶
\_^: Matches start-of-line.
Example:
This matches white space, end-of-lines, and blank lines, then "foo" at start-of-line.
Word Boundaries in Vim Regex¶
Word boundaries can be denoted by escaped angle brackets: \<word\>
Overview of Multi Items¶
pattern-overview
\m |
\M |
Matches |
|---|---|---|
| Magic | No Magic | |
\_^ |
\_^ |
start-of-line (used anywhere) /zero-width |
\_$ |
\_$ |
end-of-line (used anywhere) zero-width |
\< |
\< |
beginning of a word zero-width |
\> |
\> |
end of a word zero-width |
\zs |
\zs |
anything, sets start of match |
\ze |
\ze |
anything, sets end of match |
\%^ |
\%^ |
beginning of file zero-width |
\%$ |
\%$ |
end of file zero-width |
\%V |
\%V |
inside Visual area zero-width |
\%# |
\%# |
cursor position zero-width |
\%'m |
\%'m |
mark m position zero-width |
\%23l |
\%23l |
in line 23 zero-width |
\%23c |
\%23c |
in column 23 zero-width |
\%23v |
\%23v |
in virtual column 23 zero-width |
Greedy Multis¶
\m |
\M |
Matches of the Preceding Atom |
|---|---|---|
| Magic | No Magic | Greedy |
* |
\* |
0 or more, as many as possible |
\+ |
\+ |
1 or more, as many as possible |
\= |
\= |
0 or 1, as many as possible |
\? |
\? |
0 or 1, as many as possible |
\{n,m} |
\{n,m} |
n to m, as many as possible |
\{n} |
\{n} |
n, exactly |
\{n,} |
\{n,} |
at least n,as many as possible |
\{,m} |
\{,m} |
0 to m, as many as possible |
\{} |
\{} |
0 or more, as many as possible (same as *) |
Non-Greedy Multis¶
\m |
\M |
Matches of the Preceding Atom |
|---|---|---|
| Magic | No Magic | Non-Greedy |
\{-n,m} |
\{-n,m} |
n to m, as few as possible |
\{-n} |
\{-n} |
n exactly |
\{-n,} |
\{-n,} |
at least n as few as possible |
\{-,m} |
\{-,m} |
0 to m as few as possible |
\{-} |
\{-} |
0 or more as few as possible |
-
Remember:
-
If a dash (
-) appears immediately after the opening brace,{, then the shortest match first algorithm is used. -
i.e.,
\{-...}= Non-Greedy
-
Non-greedy pattern-multi-items:¶
\m |
\M |
Matches of the Preceding Atom |
|---|---|---|
\@> |
\@> |
1, like matching a whole pattern |
\@= |
\@= |
nothing, requires a match zero-width |
\@! |
\@! |
nothing, requires NO match zero-width |
\@<= |
\@<= |
nothing, requires a match behind zero-width |
\@<! |
\@<! |
nothing, requires NO match behind zero-width |
It's recommended to use \zs instead of \@<= with the new regex engine.
Matching Newlines / End-of-Line Inside a Collection¶
Since $ doesn't match newline/end-of-line in a collection,
you'll need to use one of these:
\_/\n: When used inside a 'collection' ([ ])- With
\_prepended a collection also includes the end-of-line. - The same can be done by including
\nin a collection.
- With
Matching Literal Key Characters¶
-
To include a literal
],^,-or\in the collection, put a
backslash before it:[xyz\]],[\^xyz],[xy\-z]and[xyz\\].- (Note: POSIX does not support the use of a backslash this way).
Dealing with Accents in Unicode¶
- If there are unicode characters with accents, check
\Zand\%C. -
/[[=* *[==]: An equivalence class.-
This means that characters are matched that
have almost the same meaning, e.g., when ignoring accents. -
This only works for Unicode, latin1 and latin9.
Unicode Accents Example¶
[=a=]will match characters likea,à,á,â, etc., because
they are all variations of the base characterawith different accents. -
Range of Operation¶
<number>: an absolute line number.: the current line$: the last line in the file%: the whole file. The same as 1,$'t: position of mark "t"/pattern[/]: the next line where text "pattern" matches.?pattern[?]: the previous line where text "pattern" matches\/: the next line where the previously used search pattern matches\?: the previous line where the previously used search pattern matches\&: the next line where the previously used substitute pattern matches
Capture Groups and Backreferences with Substitutions and Other Pattern Commands¶
Commands: :s, :g, :v¶
You can group parts of the pattern expression by enclosing them
with \( and \) (escaped parentheses, unless very magic is set).
Using \| you can combine several expressions
into one, matching any of its components.
The first one matched will be used.
Then they can be referenced in the substitute with:
&: The whole matched pattern\0: The whole matched pattern\1, ...,\9: The matched pattern in then'th capture group (\(...\))- The numbering is done based on which
\(comes first in the pattern (left to right).
- The numbering is done based on which
~: The previous substitute string\L: The following characters are made lowercase\U: The following characters are made uppercase\E: End of\Uand\L\e: End of\Uand\L\r: Split line in two at this point\b: Insert a<BS>\l: Next character made lowercase\u: Next character made uppercase<CR>: Split line in two at this point (Type the<CR>asCTRL-Q <Enter>*)\<CR>: Insert a carriage-return (CTRL-M) (Type the<CR>asCTRL-Q <Enter>*)\n: Insert a<NL>(<NUL>in the file) (does NOT break the line)\t: Insert a<Tab>\\: Insert a single backslash\x: Is any character not mentioned above: Reserved for future expansion
* Some systems support CTRL-V <Enter> to insert the literals
Flags¶
g: Global, replaces all occurrences on each line.i: Case insensitive.I: Case sensitive.c: Confirm each substitution.e: Suppress "no match" error.n: Report the number of matches, and don't actually substitute.p: Print the line containing the last substitute.#: Likepand prepend the line number.l: Likepbut print the text like:list.
-
&Must be the first one. Keep the flags from the previous substitute command.
Examples::&&:s/this/that/&
-
r: Only useful in combination with:&or:swithout arguments. -
:&rworks the same way as:~:-
When the search pattern is empty, use the previously used search pattern
instead of the search pattern from the last:sor:global. -
If the last command that did a search was a
:sor:global, there is no effect.
-
Two and Three Letter :substitute Commands¶
You can use flags directly in the in the commands so you don't need to specify them at the end:
| c | e | g | i | I | n | p | l | r | |
|---|---|---|---|---|---|---|---|---|---|
| g | :sgc |
:sge |
:sg |
:sgi |
:sgI |
:sgn |
:sgp |
:sgl |
:sgr |
| I | :sIc |
:sIe |
:sIg |
:sIi |
:sI |
:sIn |
:sIp |
:sIl |
:sIr |
| c | :sc |
:sce |
:scg |
:sci |
:scI |
:scn |
:scp |
:scl |
|
| r | :src |
:srg |
:sri |
:srI |
:srn |
:srp |
:srl |
:sr |
|
| i | :sic |
:sie |
:si |
:siI |
:sin |
:sip |
:sir |
||
| e | |||||||||
| n | |||||||||
| p | |||||||||
| l |
Exceptions:
:scris:scriptnames:seis:set:sigis:sign:silis:silent:snis:snext:spis:split:slis:sleep:sreis:srewind
Repeating Substitutions¶
-
&: Synonym for:s(repeat last substitute). -
:~Repeat last substitute with same substitute string
but with last used search pattern. This is like:&r. -
g&: Synonym for:%s//~/&(repeat last substitute with
last search pattern on all lines with the same flags).
Substitutions with Expressions¶
When the substitute string starts with \=,
the remainder is interpreted as an expression.
The separation char can not be in the expression!
| Substitution | Effect |
|---|---|
:s@\n@\="\r" .. expand("$HOME") .. "\r"@ |
This replaces an end-of-line with a new line containing the value of $HOME |
s/E/\="\<Char-0x20ac>"/g |
This replaces each E character with a euro sign. |
:h <Char->¶
Examples:
:s@\n@\="\r" .. expand("$HOME") .. "\r"@- This replaces an end-of-line with a new line containing the value of
$HOME. >
- This replaces an end-of-line with a new line containing the value of
s/E/\="\<Char-0x20ac>"/g- This replaces each
Echaracter with a euro sign. Read more in<Char->.
- This replaces each
Quantifiers, Greedy and Non-Greedy¶
Greedy¶
-
*: matches 0 or more of the preceding characters, ranges or metacharacters.*matches everything including empty line\+: matches 1 or more of the preceding characters.\=: matches 0 or 1 more of the preceding characters.
-
\{n}: matches exactly n times of the preceding characters. \{n,m}: matches from n to m of the preceding characters.\{,m}: matches at most m (from 0 to m) of the preceding characters.\{n,}: matches at least n of of the preceding characters.
where n and m are positive integers (>0)
Non-Greedy¶
:h atom
Parentheses can be used to make a pattern into an atom.
\{-}: matches 0 or more of the preceding atom, as few as possible\{-n,m}: matches 1 or more of the preceding characters.\{-n,}: matches at lease or more of the preceding characters.\{-,m}: matches 1 or more of the preceding characters.
where n and m are positive integers (>0)
Optionally Match Atoms¶
\%[]: A sequence of optionally matched atoms. This always matches.- The longest that matches is used.
- There can be no
\(\),\%(\)or\z(\)items inside the[] \%[]does not nest.
Optional Matching Example¶
| Pattern | Matches |
|---|---|
/r\%[ead] |
matches r, re, rea or read. |
/\<fu\%[nction]\> |
matches the Ex command function, where fu is required and nction is optional |
/\<r\%[[eo]ad]\> |
matches the words r, re, ro, rea, roa, read and road. |
Match Inside the Visual Area¶
\%V: Match inside the Visual area.- When Visual mode has already been stopped match in the area that
gvwould reselect. - To make sure the whole pattern is inside the Visual area:
- Put it at the start and just before the end of the pattern.
- i.e.,
/\%Vfoo.*ba\%Vr
- i.e.,
- Put it at the start and just before the end of the pattern.
- Only works for the current buffer.
- When Visual mode has already been stopped match in the area that
Visual Area Matching Example¶
String: foo bar
| Pattern | Matches |
|---|---|
/\%Vfoo.*ba\%Vr |
This works if only foo bar was Visually selected. |
/\%Vfoo.*bar\%V |
Would match foo bar if the Visual selection continues after the r. |
Match with the Cursor Position¶
\%#: Matches with the cursor position.- Only works when matching in a buffer displayed in a window.
Using Marks for Matching¶
\%'mMatches with the position of mark m.\%<'mMatches before the position of mark m.\%>'mMatches after the position of mark m.
Line Number Matching¶
Using Line Numbers for Matching¶
\%23lMatches in a specific line.\%<23lMatches above a specific line (lower line number).\%>23lMatches below a specific line (higher line number).
The "23" can be any line number. The first line is 1.
Using the Current Line for Matching¶
\%.lMatches at the cursor line.\%<.lMatches above the cursor line.\%>.lMatches below the cursor line.
These six can be used to match specific lines in a buffer.
Matching with Start and End of the File¶
\%^: Matches start of the file. When matching with a string, matches the start of the string.\%$: Matches end of the file. When matching with a string, matches the end of the string.
Matching with Columns¶
\%23c: Matches in a specific column.\%<23c: Matches before a specific column.\%>23c: Matches after a specific column.\%.c: Matches at the cursor column.\%<.c: Matches before the cursor column.\%>.c: Matches after the cursor column.
Matching After a Pattern¶
-
\@<=: Matches with zero width if the preceding atom matches just before what
follows. |/zero-width|- Like
(?<=pattern)in Perl, but Vim allows non-fixed-width patterns.
- Like
Lookback (Matching after a pattern) Example¶
| Pattern | Matches |
|---|---|
\(an\_s\+\)\@<=file |
"file" after "an" and white space or an end-of-line |
Matching After a NON-matching pattern¶
-
\@<!: Matches with zero width if the preceding atom does NOT match just
before what follows.- Like
(?<!pattern)in Perl, but Vim allows non-fixed-width patterns. - This can be a bit slow.
- Like
Lookback (Match after a non-matching pattern) Example¶
| Pattern | Matches |
|---|---|
\(foo\)\@<!bar |
any "bar" that's not in "foobar" |
\(\/\/.*\)\@<!in |
"in" which is not after "//" |
\@123<!: Like\@<!but only look back 123 bytes. This avoids trying lots of
matches that are known to fail and make executing the pattern very
slow.
Match Excluding the Preceding Atom¶
\@=(or\&): Matches the preceding atom with zero width.- Like
(?=pattern)in Perl.
- Like
Lookback (Matching the beginning of a pattern) Example¶
The string: foobar
| Pattern | Matches |
|---|---|
foo\(bar\)\@= |
foo in foobar |
foo\(bar\)\& |
foo in foobar |
foo\(bar\)\@=foo |
nothing |
* Using \& works the same as using \@=: |
|
* foo\&.. is the same as \(foo\)\@=... |
|
* \& is easier, you don't need the parentheses. |
|
| ### Use Cases |
foo\(bar\)\@=: Find allfoos that are followed bybarfoo\(bar\|baz\| bar\)\@=: Find allfoos that are followed bybar,baz, orbar(space
bar)
Zero-Width¶
-
/zero-width-
When using
\@=(or^,$,\<,\>) no characters are included
in the match. -
These items are only used to check if a match can be made.
- This can be tricky, because a match with following items will
be done in the same position.
Zero-Width Matching Example¶
-
| Pattern | Matches |
|---|---|
foo\(bar\)\@=foo |
nothing |
The example above will not match foobarfoo,
because it tries match foo in the same position where
bar matched.
Setting the Start of a Match with \zs¶
-
\zs: Matches at any position, but not inside [], and sets the start of the
match there:- The next char is the first char of the whole match.
- This cannot be followed by a multi.
:h multi
Matching with
\zsExample¶
| Pattern | Matches |
|---|---|
/^\s*\zsif |
matches an "if" at the start of a line, ignoring white space. |
/\(.\{-}\zsFab\)\{3} |
Finds the third occurrence of Fab. |
Setting the End of a Match with \ze¶
-
\ze: Matches at any position, but not inside [], and sets the end of the
match there:- The previous char is the last char of the whole match.
Ending match with
\zeExample¶
| Pattern | Matches |
|---|---|
end\ze\(if\|for\) |
matches the end in endif and endfor. |
Less Useful Patterns¶
Match if Previous Pattern Doesn't Match at the CURRENT Position¶
\@!: Matches with zero width if the preceding atom does NOT match at the
current position.- Like
(?!pattern)in Perl.
- Like
Zero-Width Match after Non-Match Example¶
| Pattern | Matches |
|---|---|
if \(\(then\)\@!.\)*$ |
if not followed by then |
a.\{-}p\@! |
a, ap, app, appp, etc. not immediately followed by a p |
/^\%(.*bar\)\@!.*\zsfoo |
foo in a line that does not contain bar |
foo\(bar\)\@! |
any foo not followed by bar |
You can't use \@! to look for a non-match before the matching position.
\(foo\)\@!bar will match bar in foobar, because
foo does not match at the position where bar matches.
Use \(foo\)\@<!bar (\@<!).
Match at the Current Position like a Single Pattern¶
\@>Matches the preceding atom like matching a whole pattern.- Like
(?>pattern)in Perl.
- Like
Matching with \@> Example¶
- The string:
aaab
| Pattern | Matches |
|---|---|
\(a*\)\@>ab |
will not match aaab, because the a* matches the aaa (as many "a"s as possible), thus the ab can't match. |
\(a*\)\@>a |
nothing (the a* takes all the a's, there can't be another one following) |
Matching Different Number Systems¶
\%d123: Matches the character specified with a decimal number.- Must be followed by a non-digit.
-
\%o40: Matches the character specified with an octal number up to 0o377.- Numbers below 0o40 must be followed by a non-octal digit or a
non-digit.
- Numbers below 0o40 must be followed by a non-octal digit or a
-
\%x2a: Matches the character specified with up to two hexadecimal characters. -
\%u20AC: Matches the character specified with up to four hexadecimal
characters. -
\%U1234abcd: Matches the character specified with up to eight hexadecimal
characters, up to 0x7fffffff
Matching Decimal, Octal, and Hexadecimal Number Systems¶
\%d: Matching Decimal (base10)\%o: Matching Octal (base8)\%x: Matching Hexadecimal (base16)- Up to 2 hexadecimal characters
\%u: Matching Hexadecimal (base16)- Up to 4 hexadecimal characters
\%U: Matching Hexadecimal (base16)- Up to 8 hexadecimal characters
Examples¶
\%d123: Matches the character specified with a decimal number.- Must be followed by a non-digit.
\%o40: Matches the character specified with an octal number up to0o377.- Numbers below
0o40must be followed by a non-octal digit or a non-digit.
- Numbers below
\%x2a: Matches the character specified with up to two hexadecimal characters.\%u20AC: Matches the character specified with up to four hexadecimal characters.\%U1234abcd: Matches the character specified with up to eight hexadecimal characters, up to0x7fffffff
-
/[[=/[==]- An equivalence class. Match accented
acharacters (i.e.,â,ã,å, etc.)
- An equivalence class. Match accented
-
[..]- A collation element.
- This currently simply accepts a single
character in the form:[.a.]
Collections / Sets¶
[]: A Collection (sometimes called a 'set') - Matches any single character in the collection.- Think of this as a custom character class. A set will only match a single character.
\%[]A sequence of optionally matched characters. This always matches.- The longest match is used with this.
\_[]: A collection that also matches end-of-line.[\n]: With\_prepended the collection OR\nin the collection also
includes the end-of-line.
Starting a collection with^will make it match
everything BUT what is in the collection:
The above will match a line that does NOT start with
a digit character.
Collection Limitations / Caveats¶
There can be no \(\), \%(\) or \z(\) items inside the [], and \%[] does not nest.
Collection Examples¶
Matchesindex, index[, index[0, and index[0].
Good Ones to Remember¶
-
\%(\): A pattern enclosed by escaped parentheses.- Just like
\(\), but without counting it as a capture (no backref). - This allows using more groups and it's a little bit faster.
- Just like
-
~/\~: Matches the last given substitute string. \<: Matches the beginning of a word: The next char is the first char of a word.-
\>: Matches the end of a word: The previous char is the last char of a word. -
\_.: Matches any single character or end-of-line. \_^: Matches start-of-line.
Example:
This matches white space, end-of-lines, and blank lines, then "foo" at start-of-line.
-
\@<=: Matches everything after the previous atom- It's recommended to use
\zsinstead of\@<=with the new regex engine.
- It's recommended to use
-
\zs: Matches at any position, but not inside[], and sets the start of the
match there.
Make it Non-Greedy¶
When using the brace notation (\{1,}), you can easily make it non-greedy.
-
If a dash (
-) appears immediately after the opening brace,{, then the shortest match first algorithm is used.- i.e.,
\{-...}= Non-Greedy
- i.e.,
So:
\{-}is a non-greedy version of*\{-1}is a non-greedy version of+