Regex-Fu Special Characters

The period .

The dot means matching any characters (except line breaks)

If we want to match the period character (.) we can find at the end of a sentence, we have to use backslash \ to specify that we want the character and not the regex special character.

\w and \W

\w Match any alphanumeric characters

In contrast \W will match anything that is not an alphanumeric character

The curly brackets {}

To specify the number of characters.

For example, the following regex will match any set of 4 alphanumeric characters.

\w{4}

To match any set of 4 alphanumeric characters or more:

{4,}

To match any set of 4 or 5 alphanumeric characters:

{4,5}

The squared brackets []

This is used to group characters.

This match any letter from a to z followed by a period (.)

[a-z]\.

We can also include capital letters. This will match any lower or uppercase letter followed by a period

[a-zA-Z]\.

The parenthesis ()

The parenthesis are also used to group characters.

The regex below will search for any t or T characters. The | acts like a logical OR

(t|T)

Match any set with two or three t, e or r letters followed by a period. Here the {} will affects what is contained within the parenthesis

(t|e|r){2,3}\.

The ^

Meaning starts with ...

Match any line that starts with the B letter

^B

In the image below we add the multiline options to apply the regex to each line instead of the file as a whole.

The dollar sign ($)

The dollar sign means at the end of ...

This regex will match every period (.) at the end of the line. Note that the multiline option is required here to apply the regex to each single line instead of the text as a whole.

\.$

The plus sign (+)

The plus sign means match 1 or more ...

This regular expression means match the letter "e" or set of multiple "e" characters in a row.

e+

The question mark (?)

Means optionally

This regex will match every set of one or more e characters and optionally ea if a a letter follows the e letter.

e+a?

The star (*)

Meaning: Matching 0 or more

This regex will match the r character followed by the e letter 0 or multiple time.

Look behind

Positive look behind

(?<=)

Does not capture what is behind what you want to look for. For example, this regex will select every set of alphanumeric characters that is after the word The or the and followed by a space, but we do not want to capture the [Tt]he[ ] pattern.

(?<=[Tt]he[ ])\w*

Negative look behind

(?<!)

Will match what is NOT preceded by the expression specified between the parenthesis.

For example, this regular expression will match any alphanumeric character that is NOT preceded by the word The/the followed by a space.

(?<![Tt]he[ ])\w

Here we can observe that alphanumeric characters following The[ ] or the [ ] are not selected.

Look ahead

Positive look ahead

Will match any set of characters preceding a specific pattern.

This expression will match any set of alphanumeric characters preceding the pattern at

\w*(?=at)

Negative look ahead

Match any set of characters that is not preeceded by a specific pattern.

Thiss regex will match anything except words starting with the letter c

\W*(\b)(?!(?:c))\w+\s*

Last updated