Regular expressions are constructed by putting the various components of the expression between a pair of delimiters. In JavaScript, the delimiters are a pair of forward slash (/) characters, as shown in the following example.
/expression/
Beginning ^
End $
^\d{2}$
The following are some common regular expression metacharacters and examples of what they would match or not match.
Metacharacter |
Description |
Examples |
\d |
Whole Number 0 - 9 |
\d\d\d = 327 \d\d = 81 \d = 4 ----------------------------------------- \d\d\d ≠ 24631 \d\d\d will not return 24631 because 24631 contains 5 digits. \d\d\d will only match for a 3-digit string. |
\w |
Alphanumeric Character |
\w\w\w = "dog" \w\w\w\w = "mule" \w\w = "to" ----------------------------------------- \w\w\w = 467 \w\w\w\w = 4673 ----------------------------------------- \w\w\w ≠ "boat" \w\w\w will not return "boat" because "boat" contains 4 characters. ----------------------------------------- \w ≠ ! \w will not return the exclamation point "!" because it is a non-alphanumeric character. |
\W |
Symbols |
\W = % \W = # \W\W\W = @#% ----------------------------------------- \W\W\W\W ≠ dog8 \W\W\W\W will not return "dog8" because d, o, g, and 8 are alphanumeric characters. |
[a-z] [0-9] |
Character set, at least one of which must be a match, but no more than one unless otherwise specified. The order of the characters does not matter. |
pand[ora] = panda pand[ora] = pando ----------------------------------------- pand[ora] ≠ pandora pand[ora] does not bring back "pandora" because it is implied in pand[ora] that only 1 character in [ora] can be returned. (Quantifiers that will allow pand[ora] to match for "pandora" will be discussed below). |
(abc) -123 |
Character group, matches the characters abc or 123 in that exact order. |
pand(ora) = pandora pand(123) = pand123 ----------------------------------------- pand(oar) ≠ pandora pand(oar) does not match for "pandora" because it is looking for the exact phrase "pandoar". |
| |
Alternation - allows for alternate matches. | operates like the Boolean OR. |
pand(abc|123) = "pandora" OR "pand123" |
{n} |
Matches when the preceding character, or character group, occurs n times exactly. |
\d{3} = 836 \d{3} = 139 \d{3} = 532 ----------------------------------------- pand[ora]{2} = "pandar" pand[ora]{2} = "pandoo" pand(ora){2} = "pandoraora" ----------------------------------------- pand[ora]{2} ≠ pandora pand[ora]{2} will not match for "pandora" because the quantifier {2} only allows for 2 letters from the character set [ora]. |
{n,m} |
Matches when the preceding character, or character group, occurs at least n times, and at most m times. |
\d{2,5} = 97430 \d{2,5} = 9743 \d{2,5} = 97 ----------------------------------------- \d{2,5} ≠ 9 9 does not match because it is 1 digit, thus outside of the character range. |
? |
Question mark matches when the character preceding the ? sign occurs 0 or 1 time only, making the character match optional. |
colou?r = "colour" (u is found 1 time) colou?r = "color" (u is found 0 times) |
* |
Asterisk matches when the character preceding * matches 0 or more times. NOTE: * in RegEx is different from * in dtSearch. RegEx * is asking to find where the character (or grouping) preceding * is found ZERO or more times. dtSearch * is asking to find where the string of characters preceding * or following * is found 1 or more times. |
tre*= "tree" (e is found 2 times) tre* = "tre" (e is found 1 time) tre* = "tr" (e is found 0 times) ----------------------------------------- tre* ≠ "trees" tre* will not match the term "trees because although "e" is found 2 times, it is followed by "s", which is not accounted for in the RegEx." |
+ |
Plus sign matches when the character preceding + matches 1 or more times. The + sign makes the character match mandatory. |
tre+ = "tree (e is found 2 times) tre+ = "tre" (e is found 1 time) ----------------------------------------- tre+ ≠ "tr" (e is found 0 times) tre+ will not match for "tr" because e is found zero times in "tr". |
. (period) |
The period matches any alphanumeric character or symbol. |
ton. = "tone" ton. = "ton#" ton. = "ton4" ----------------------------------------- ton. ≠ "tones" ton. will not match for the term "tones" because . by itself will only match for a single character, here, in the 4th position of the term. In "tones", the s is the 5th character and is not accounted for in the RegEx. |
.* |
Combine the metacharacters . and *, in that order .* to match for any character 0 or more times. NOTE: .* in RegEx is equivalent to dtSearch wildcard * operator. |
tr.* = "tr" tr.* = "tre" tr.* = "tree" tr.* = "trees" tr.* = "trough" tr.* = "treadmill" |
It is always very important to test your regular expression before going live! This way you make sure that the data you want to collect is correct. Here you have a free online RegEx tester: https://www.regextester.com/