Building your own Regular Expression

Regular expressions are constructed by putting the various components of the expression between a pair of delimiters. In JavaScript, the delimiters are a pair of forward slash (/) characters, as shown in the following example.

/expression/


Beginning ^

End $

^\d{2}$

The following are some common regular expression metacharacters and examples of what they would match or not match.

Metacharacter
Description
Examples
\d
Whole Number 0 - 9
\d\d\d = 327

\d\d = 81

\d = 4

-----------------------------------------

\d\d\d ≠ 24631

\d\d\d will not return 24631 because 24631 contains 5 digits. \d\d\d will only match for a 3-digit string.
\w
Alphanumeric Character
\w\w\w = "dog"

\w\w\w\w = "mule"

\w\w = "to"

-----------------------------------------

\w\w\w = 467

\w\w\w\w = 4673

-----------------------------------------

\w\w\w ≠ "boat"

\w\w\w will not return "boat" because "boat" contains 4 characters.

-----------------------------------------

\w ≠ !

\w will not return the exclamation point "!" because it is a non-alphanumeric character.
\W
Symbols
\W = %

\W = #

\W\W\W = @#%

-----------------------------------------

\W\W\W\W ≠ dog8

\W\W\W\W will not return "dog8" because d, o, g, and 8 are alphanumeric characters.
[a-z]
[0-9]
Character set, at least one of which must be a match, but no more than one unless otherwise specified. The order of the characters does not matter.
pand[ora] = panda

pand[ora] = pando

-----------------------------------------

pand[ora] ≠ pandora

pand[ora] does not bring back "pandora" because it is implied in pand[ora] that only 1 character in [ora] can be returned.

(Quantifiers that will allow pand[ora] to match for "pandora" will be discussed below).
(abc)
-123
Character group, matches the characters abc or 123 in that exact order.
pand(ora) = pandora

pand(123) = pand123

-----------------------------------------

pand(oar) ≠ pandora

pand(oar) does not match for "pandora" because it is looking for the exact phrase "pandoar".
|
Alternation - allows for alternate matches. | operates like the Boolean OR.
pand(abc|123) = "pandora" OR "pand123"
{n}
Matches when the preceding character, or character group, occurs n times exactly.
\d{3} = 836

\d{3} = 139

\d{3} = 532

-----------------------------------------

pand[ora]{2} = "pandar"

pand[ora]{2} = "pandoo"

pand(ora){2} = "pandoraora"

-----------------------------------------

pand[ora]{2} ≠ pandora

pand[ora]{2} will not match for "pandora" because the quantifier {2} only allows for 2 letters from the character set [ora].
{n,m}
Matches when the preceding character, or character group, occurs at least n times, and at most m times.
\d{2,5} = 97430

\d{2,5} = 9743

\d{2,5} = 97

-----------------------------------------

\d{2,5} ≠ 9

9 does not match because it is 1 digit, thus outside of the character range.
?
Question mark matches when the character preceding the ? sign occurs 0 or 1 time only, making the character match optional.
colou?r = "colour" (u is found 1 time)

colou?r = "color" (u is found 0 times)
*
Asterisk matches when the character preceding * matches 0 or more times.

NOTE: * in RegEx is different from * in dtSearch. RegEx * is asking to find where the character (or grouping) preceding * is found ZERO or more times. dtSearch * is asking to find where the string of characters preceding * or following * is found 1 or more times.
tre*= "tree" (e is found 2 times)

tre* = "tre" (e is found 1 time)

tre* = "tr" (e is found 0 times)

-----------------------------------------

tre* ≠ "trees"

tre* will not match the term "trees because although "e" is found 2 times, it is followed by "s", which is not accounted for in the RegEx."
+
Plus sign matches when the character preceding + matches 1 or more times. The + sign makes the character match mandatory.
tre+ = "tree (e is found 2 times)

tre+ = "tre" (e is found 1 time)

-----------------------------------------

tre+ ≠ "tr" (e is found 0 times)

tre+ will not match for "tr" because e is found zero times in "tr".
. (period)
The period matches any alphanumeric character or symbol.
ton. = "tone"

ton. = "ton#"

ton. = "ton4"

-----------------------------------------

ton. ≠ "tones"

ton. will not match for the term "tones" because . by itself will only match for a single character, here, in the 4th position of the term. In "tones", the s is the 5th character and is not accounted for in the RegEx.
.*
Combine the metacharacters . and *, in that order .* to match for any character 0 or more times.

NOTE: .* in RegEx is equivalent to dtSearch wildcard * operator.
tr.* = "tr"

tr.* = "tre"

tr.* = "tree"

tr.* = "trees"

tr.* = "trough"

tr.* = "treadmill"

It is always very important to test your regular expression before going live! This way you make sure that the data you want to collect is correct. Here you have a free online RegEx tester: https://www.regextester.com/


Was this article helpful?