MariaDB: REGEXP_SUBSTR Function
This MariaDB tutorial explains how to use the MariaDB REGEXP_SUBSTR function with syntax and examples.
Description
The MariaDB REGEXP_SUBSTR function is an extension of the SUBSTR function. This function, introduced in MariaDB 10.0.5, will allow you to extract a substring in a string using regular expression pattern matching.
Syntax
The syntax for the REGEXP_SUBSTR function in MariaDB is:
REGEXP_SUBSTR( string, pattern )
Parameters or Arguments
- string
- The string to search.
- pattern
The regular expression matching information. It can be a combination of the following:
Value Description ^ Matches the beginning of a string. If used with a match_parameter of 'm', it matches the start of a line anywhere within expression. $ Matches the end of a string. If used with a match_parameter of 'm', it matches the end of a line anywhere within expression. * Matches zero or more occurrences. + Matches one or more occurrences. ? Matches zero or one occurrence. . Matches any character except NULL. | Used like an "OR" to specify more than one alternative. [ ] Used to specify a matching list where you are trying to match any one of the characters in the list. [^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list. ( ) Used to group expressions as a subexpression. {m} Matches m times. {m,} Matches at least m times. {m,n} Matches at least m times, but no more than n times. \n n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n. [..] Matches one collation element that can be more than one character. [::] Matches character classes. [==] Matches equivalence classes. \d Matches a digit character. \D Matches a nondigit character. \w Matches a word character. \W Matches a nonword character. \s Matches a whitespace character. \S matches a non-whitespace character. *? Matches the preceding pattern zero or more occurrences. +? Matches the preceding pattern one or more occurrences. ?? Matches the preceding pattern zero or one occurrence. {n}? Matches the preceding pattern n times. {n,}? Matches the preceding pattern at least n times. {n,m}? Matches the preceding pattern at least n times, but not more than m times.
Note
- The REGEXP_SUBSTR function performs a case-insensitive match, except when used with binary strings.
- If the REGEXP_SUBSTR function does not find any occurrence of pattern, it will return NULL.
- This page applies to MariaDB 10.0.5 and higher, which uses the PCRE regex library.
- See also the SUBSTR function.
Applies To
The REGEXP_SUBSTR function can be used in the following versions of MariaDB:
- MariaDB 10.0.5+
Example - Match on more than one alternative
Let's start by looking how to use the |
pattern with the REGEXP_SUBSTR function in MariaDB.
For example:
SELECT REGEXP_SUBSTR ('Ball Point Pen','P(a|e|i)n'); Result: 'Pen' SELECT REGEXP_SUBSTR ('Eggs and pancakes','P(a|e|i)n'); Result: 'pan'
These REGEXP_SUBSTR examples will return values such as Pan, Pen, or Pin. The |
pattern tells us to look for the letter "a", "e", or "i" between the letters "P" and "n". The REGEXP_SUBSTR function performs a case-insensitive search so it doesn't matter what case the letters are in the string.
Case Sensitive Search
If we wanted to perform a case-sensitive search, we need to modify our REGEXP_SUBSTR function to use a binary string. This can be done one of two ways.
SELECT REGEXP_SUBSTR (BINARY 'Eggs and pancakes', 'P(a|e|i)n'); Result: NULL
OR ...
SELECT REGEXP_SUBSTR ('Eggs and pancakes' COLLATE utf8_bin, 'P(a|e|i)n'); Result: NULL
In the first example, we used the keyword BINARY to convert our string to a binary string. In the second example, we used COLLATE to convert our string to a binary string.
Since we are performing a case-sensitive search, the pattern does not match on 'pancakes' because it wants a capital 'P', so the REGEXP_SUBSTR function will return NULL.
Table Column Search
Now let's show how to use the REGEXP_SUBSTR function with a table column:
So let's say we have a contact table with the following data:
contact_id | last_name |
---|---|
1000 | Anderson |
2000 | Smith |
3000 | Johnson |
Now, let's run the following query:
SELECT contact_id, last_name, REGEXP_SUBSTR (last_name, 'a|e|i|o|u') AS "First Vowel" FROM contacts;
These are the results that would be returned by the query:
contact_id | last_name | First vowel |
---|---|---|
1000 | Anderson | A |
2000 | Smith | i |
3000 | Johnson | o |
Example - Match on Words
Next, let's look at how to use the REGEXP_SUBSTR function to extract the first word from a string.
For example:
SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\\S*)'); Result: 'TechOnTheNet'
This example will return 'TechOnTheNet' because it will extract all non-whitespace characters as specified by (\\S*)
and stop when it hits the first whitespace character (ie: space character). That will result in the word 'TechOnTheNet' being returned.
If you wanted to return the first two words from a string, we could modify the example as follows:
SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\\S*)(\\s)(\\S*)'); Result: 'TechOnTheNet is'
This example would extract the first two words from the string. The (\\S*)
matches on the first word in the string, the (\\s)
matches on the first non-whitespace character, and the (\\S*)
matches on the second word in the string.
Example - Match on Digit Characters
Now, let's look next at how we would use the REGEXP_SUBSTR function to match on a single digit character pattern.
For example:
SELECT REGEXP_SUBSTR ('7, 8, and 15 are numbers in this example', '\\d'); Result: 7
This example will extract the first numeric digit from the string as specified by \\d
. In this case, it will match on the number 7.
We could change our pattern to search for a two-digit number.
For example:
SELECT REGEXP_SUBSTR ('7, 8, and 15 are numbers in this example', '(\\d)(\\d)'); Result: 15
This example will extract a number that has two digits side-by-side as specified by (\\d)(\\d)
. In this case, it will skip over the 7 and 8 numeric values and return 15.
Now, let's look how we would use the REGEXP_SUBSTR function with a table column and search for a two digit number.
For example:
SELECT REGEXP_SUBSTR (address, '(\\d)(\\d)') FROM contacts;
In this example, we are going to extract the first two-digit value from the address field in the contacts table.
Advertisements