If you work with messy data in Excel — mixed-up phone numbers, email addresses buried inside long text, inconsistent date formats — you already know how painful it is to clean that stuff up.
Until now, you had two options. Either build a monster formula using LEFT, MID, RIGHT, FIND, and SUBSTITUTE all nested together. Or write a VBA macro. Both work, but neither is fun.
Now, Excel 365 has three built-in REGEX functions — REGEXTEST, REGEXEXTRACT, and REGEXREPLACE — and they change everything. You write one pattern, and Excel does the rest. No VBA. No complicated nesting. Just a clean, readable formula.
In this guide, I want to walk you through all three functions from the ground up. We'll cover the syntax, real examples, and I'll also show you how to use REGEX inside XLOOKUP and XMATCH. By the end, you'll have a full pattern library you can copy and use right away.
What Are REGEX Patterns?
REGEX stands for Regular Expressions. It's a way to describe a text pattern using a special set of characters — so you can search, match, extract, or replace text based on what it looks like rather than where it is.
Regex isn't something Microsoft invented. It's been around since the 1950s, and developers, data analysts, and programmers have used it for decades in languages like Python, JavaScript, SQL, and tools like Google Sheets. It's one of the most widely used text-processing tools in the world.
In late 2024, Microsoft finally brought regex into Excel with three new functions. And that changed everything for people who work with messy text data in spreadsheets.
Each pattern is built from small pieces — character classes, quantifiers, and anchors. You'll learn all of these in the cheat sheet section below. For now, just understand the concept: you describe what you're looking for, and regex finds it.
A Simple Way to Think About It
That's why regex is so powerful for messy data. When your phone numbers are in different positions in every cell, or your dates are in three different formats, traditional functions struggle. Regex doesn't care — it just looks for the pattern.
The Three REGEX Functions in Excel
Excel gives you three functions that use regex patterns. Each one does a different job with the pattern you give it.
REGEXTEST — Checks if the text matches a pattern and returns TRUE or FALSE. Use it when you need to validate data or filter rows.
REGEXEXTRACT — Finds the matching text and pulls it out. Use it when you need to grab a phone number, email, or date from inside a longer text.
REGEXREPLACE — Finds the matching text and replaces it with something else. Use it when you need to clean, reformat, or mask data.
All three functions take the same kind of regex pattern. So once you learn how to build a pattern, you can use it in any of them.
Why REGEX Patterns Are a Big Deal in Excel
Let me be honest with you. For years, extracting or cleaning text in Excel felt like solving a puzzle with the wrong pieces. You needed four, five, sometimes six functions nested together just to pull out an email address or validate a phone number.
And the worst part? Those formulas broke the moment your data changed even slightly.
REGEX solves that. Instead of telling Excel where the data is (character 5 through character 12), you tell it what the data looks like. That's a completely different approach — and it's way more reliable.
The Real Problems REGEX Solves
Here are the everyday situations where REGEX turns something painful into something simple.
See the Difference — Before and After REGEX
The best way to understand why REGEX matters is to see it side by side. Let's take a simple task: extract a phone number (format: 415-555-1234) from a cell that has a bunch of text in it.
Now let's take another one: check if a cell contains a valid email address.
Now that you know why REGEX is worth learning, let's look at which function to use for what.
Which REGEX Function Do You Need?
Before we get into the details, here's a quick table to help you pick the right function for the job.
What You Want to Do | Function | Example Pattern | What It Returns |
|---|---|---|---|
Check if a cell contains a number | REGEXTEST | [0-9] | TRUE or FALSE |
Validate an email address format | REGEXTEST | ^[\w\.\-]+@[\w\.\-]+\.\w{2,}$ | TRUE or FALSE |
Extract a phone number from text | REGEXEXTRACT | \d{3}-\d{3}-\d{4} | The matched phone number |
Pull out all hashtags from a post | REGEXEXTRACT | #\w+ | All matching hashtags (spills) |
Mask the first 3 digits of a number | REGEXREPLACE | ^\d{3} | ***-456-7890 |
Swap "First Last" → "Last, First" | REGEXREPLACE | (\w+)\s(\w+) | Last, First |
Remove all non-numeric characters | REGEXREPLACE | [^0-9] | Only the digits remain |
Look up messy data like "USA" or "United States" | XLOOKUP (mode 3) | USA|United States | First matching row's value |
Regex Basics You Need to Know Before We Start
If you've never used regular expressions before, don't worry. You don't need to memorize the entire regex language. You just need a handful of building blocks, and you can combine them to do almost anything.
Think of it this way — regex is just a way to describe what text looks like, not where it is. Instead of saying "grab characters 5 through 10," you say "grab anything that looks like a phone number."
Here are the pieces you'll use the most in Excel. Click each tab to explore, and then try your own patterns in the live tester below.
A character class tells regex which characters are allowed at a specific position.
^ symbol inside square brackets means "NOT." So [^A-Z] matches anything that is not an uppercase letter. Outside square brackets, ^ means "start of text" — see the Anchors tab. Quantifiers go right after a character class and tell regex how many times that character should appear.
Anchors don't match a character. They match a position in the text. Use them when you need to control where the match happens.
^ at the start and $ at the end. Without them, REGEXTEST returns TRUE even if only part of the cell matches. Groups let you capture parts of a match. The OR operator lets you match one thing or another.
$1, $2 for backreferences in REGEXREPLACE — not \1, \2 like most other regex tools. This trips people up all the time. Some characters have special meaning in regex — like . (matches anything) and ( (starts a group). To match the actual character, put a backslash before it.
\ before it. The common ones to escape: . * + ? ^ $ ( ) [ ] { } | \ By default, all three REGEX functions in Excel are case-sensitive. That means [A-Z] will NOT match a lowercase "a". Here's how to change that.
(?i) flag is handy when using REGEX inside XLOOKUP (match mode 3), where there's no separate case_sensitivity argument. Just add it to the front of your pattern. That's all you need to get started. Now let's use these building blocks inside the actual Excel functions.
REGEXTEST — Check If Text Matches a Pattern
REGEXTEST is the simplest of the three REGEX functions. You give it some text and a pattern, and it tells you TRUE or FALSE. That's it.
This is the function you use when you need to answer one question: does this cell match a pattern or not?
- text The cell or text string you want to test. Can also be a range — if you pass a range, it spills TRUE/FALSE for every cell.
- pattern The regex pattern to match against. Must be enclosed in double quotes. Uses PCRE2 syntax.
- case_sensitivity Optional. 0 = case-sensitive (this is the default). 1 = ignore case. You can also add
(?i)at the start of your pattern instead of using this argument.
[0-9], it returns TRUE — because there are digits somewhere in there.Identify Product Codes That Contain Numbers
Let's say you have a list of product codes in column A. Some codes have numbers in them, some are only letters. You want to quickly flag which ones contain at least one digit.
This is where we use our first pattern from the crash course — [0-9] — which means "any single digit from 0 to 9."
Below are the steps you need to follow.
Click on cell B2 and enter the following formula.
In the end, hit Enter. It returns TRUE because cell A2 ("ABC123") has digits in it.
Now drag the formula down from B2 to B7. Excel checks each cell one by one.
Look at the results. "ABC123" has digits — TRUE. "HELLO" doesn't — FALSE. "XY-99-MN" has 99 in it — TRUE. "WIDGET" is only letters — FALSE. Simple.
0-9 inside means "any digit from 0 to 9." So this pattern says: is there at least one digit anywhere in this text? Now, I want to point out something important. This pattern [0-9] doesn't check how many digits there are. It doesn't check where the digit appears. It just asks: is there at least one?
That's because REGEXTEST checks if any part of the text matches. It found a digit somewhere in "ABC123" and that was enough to return TRUE.
\d instead of [0-9] — they mean the exact same thing. I used [0-9] here because it's easier to read when you're just getting started.If you wanted to check that the entire cell is only digits — nothing else — you would use anchors and a quantifier like this:
Here ^ means start of text, [0-9]+ means one or more digits, and $ means end of text. Together, they say: the cell must contain only digits from start to finish. "ABC123" would return FALSE with this pattern because it has letters too.
[0-9], an anchor ^...$, and a quantifier +? That's all regex is. You combine small building blocks into a pattern that describes what you're looking for.That's REGEXTEST. One function, one pattern, TRUE or FALSE. In the later sections, we'll use REGEXTEST inside IF, FILTER, and XLOOKUP to do more powerful things with it. But for now, let's move to the next function — REGEXEXTRACT.
REGEXEXTRACT — Pull Matching Text Out of a Cell
REGEXTEST tells you if a pattern exists. REGEXEXTRACT goes one step further — it actually pulls out the matching text and gives it back to you.
This is the function you use when you need to extract something specific from messy data. A phone number buried in a paragraph. A product ID mixed in with text. A date sitting inside a long string. REGEXEXTRACT finds the pattern and hands it to you.
- text The cell or text string you want to extract from. Works with a single cell or a range.
- pattern The regex pattern that describes what you want to find. Must be in double quotes.
- return_mode Optional. Controls what the function returns. This is the argument that makes REGEXEXTRACT really powerful — see the explanation below.
- case_sensitivity Optional. 0 = case-sensitive (default). 1 = ignore case.
Understanding return_mode
The return_mode argument is what sets REGEXEXTRACT apart. It controls how much the function gives back to you.
Extract the Numeric Part from Product Codes
Remember the product code example from REGEXTEST? We checked if a code had a number. Now let's take it further — let's actually pull out the numeric part.
Our data looks like this: "ABC123", "XY-99-MN", "PART-7A". We want just the numbers — "123", "99", "7".
Below are the steps.
Click on cell B2 and enter this formula.
Hit Enter. It returns "123" — the numeric part of "ABC123".
Drag the formula down to B7.
Look at the results. "ABC123" gave us "123". "XY-99-MN" gave us "99". "PART-7A" gave us "7". And the cells with no numbers — "HELLO", "WIDGET", "STEEL" — returned a #N/A error because there was nothing to extract.
[0-9]+ matches one digit, two digits, three digits… as many consecutive digits as it can find. Notice the difference from REGEXTEST. In the previous section, we used [0-9] without the +. That was fine for REGEXTEST because it only needed to find one digit to say TRUE.
But here, we need REGEXEXTRACT to grab all consecutive digits, not just the first one. That's why we add the + quantifier. Without it, "ABC123" would return just "1" instead of "123".
Handling the #N/A Error
You probably noticed that "HELLO", "WIDGET", and "STEEL" returned #N/A. That's because REGEXEXTRACT has nothing to extract — there are no digits in those cells.
To handle this cleanly, wrap the formula in IFERROR.
Now, instead of a #N/A error, those cells will just be blank. You can replace the "" with any text you want — like "None" or "No number found".
=VALUE(REGEXEXTRACT(A2,"[0-9]+"))That's REGEXEXTRACT. Find a pattern, pull it out. In the later sections, we'll use return_mode 1 to extract all matches and return_mode 2 to break a match into separate pieces using capture groups. But the foundation is always the same — describe the pattern, and let the function do the rest.
REGEXREPLACE — Find a Pattern and Replace It
REGEXTEST checks for a pattern. REGEXEXTRACT pulls it out. REGEXREPLACE does the third thing — it finds a pattern and swaps it with something else.
This is the function you use when you need to clean data, remove unwanted characters, reformat text, or mask sensitive information. You tell it what to find and what to put in its place.
- text The cell or text string you want to modify. Works with a single cell or a range.
- pattern The regex pattern that describes what you want to find and replace. Must be in double quotes.
- replacement What you want to put in place of the matched text. Use
""(empty quotes) to delete the match entirely. Use$1,$2to reference capture groups. - occurrence Optional. 0 = replace all occurrences (default). Any other number replaces only that specific occurrence (1 = first, 2 = second, etc.).
- case_sensitivity Optional. 0 = case-sensitive (default). 1 = ignore case.
Strip Everything Except Numbers from Product Codes
Let's continue with our product codes. This time, instead of extracting the numbers, we want to remove everything that's not a number — all the letters, dashes, and special characters — so we're left with just the digits.
Below are the steps.
Click on cell B2 and enter this formula.
Hit Enter. It returns "123" — all the non-digit characters have been removed from "ABC123".
Drag the formula down to B7.
Look at the difference from REGEXEXTRACT. "HELLO" doesn't throw a #N/A error this time — it just returns an empty cell. That's because REGEXREPLACE removes every non-digit character, and when there are no digits left, you get an empty string. No error handling needed.
^ inside the square brackets flips the meaning — this matches any character that is NOT a digit. Letters, dashes, spaces, symbols — all matched. Here's a visual of what's happening to each cell:
The letters and dashes (shown in red, strikethrough) get removed. Only the digits (shown in green) survive.
A Quick Preview: Capture Groups and $1, $2
REGEXREPLACE has one more trick that makes it really powerful — capture groups. You can use parentheses ( ) in your pattern to capture parts of the match, and then rearrange them in the replacement using $1, $2, etc.
For example, if you have names in "First Last" format and want "Last, First":
Here (\w+) captures the first name, \s matches the space, and the second (\w+) captures the last name. In the replacement, $2 puts the last name first, then a comma, then $1 puts the first name.
$1, $2 for backreferences — not \1, \2. This is different from most regex tools and programming languages. If your replacement isn't working, this is probably why.We'll use capture groups in more detail in the real-world examples section. For now, just know that this is possible — and it's one of the things that makes REGEXREPLACE incredibly useful for reformatting data.
Use REGEX Inside XLOOKUP and XMATCH
Here's something a lot of people don't know yet. Along with the three REGEX functions, Microsoft also added regex support to two functions you probably already use — XLOOKUP and XMATCH.
You don't need a separate REGEX function. You just change one argument — the match_mode — and suddenly your lookup can search using patterns instead of exact values.
The match_mode Argument
Both XLOOKUP and XMATCH have a match_mode argument that controls how they search. Here are the options:
Setting match_mode to 3 tells XLOOKUP or XMATCH: "The lookup value I'm giving you is a regex pattern. Use it to search." That's it — one number changes everything.
Look Up a Country That's Spelled Differently
This is a very common problem. You have a data table where the country column is messy — some rows say "USA", others say "United States", maybe one says "U.S.A." You need to look up a value, but you're not sure which spelling appears first.
With a regular XLOOKUP, you'd need an exact match. With regex match mode, you can use the OR operator | from the crash course to match any of them at once.
Below are the steps.
Let's say your data is in columns A and B — country names in A, sales figures in B.
In a blank cell, enter this formula.
Hit Enter. XLOOKUP finds the first row in column A that contains either "USA" or "United States" and returns the matching value from column B.
Look at what happened. The formula scanned column A from top to bottom. Row 5 has "United States" — that matches the second option in our USA|United States pattern. So XLOOKUP returns 82,500 from column B.
Notice that row 8 also has "USA" which would match too. But XLOOKUP returns the first match it finds, which is row 5.
| means OR. So it matches "USA" or "United States". "USA" as the lookup value does a partial match. XLOOKUP checks if the pattern exists anywhere in the cell. So "United States of America" would also match "States" if that was your pattern. If you want an exact cell match, use anchors: "^USA$".Same Thing with XMATCH
XMATCH works the same way — match_mode 3 enables regex. The difference is that XMATCH returns the position (row number) of the match instead of a value from another column.
This returns 4 — because "United States" is the 4th item in the range A2:A10. You can use this position number inside INDEX or other functions when you need more control over what gets returned.
Making It Case-Insensitive
One thing to know — regex match_mode in XLOOKUP and XMATCH is case-sensitive by default. So "usa" wouldn't match "USA".
Unlike the three REGEX functions, XLOOKUP doesn't have a separate case_sensitivity argument. So you use the inline flag from the crash course — (?i) — at the start of your pattern.
Now it matches "USA", "usa", "United States", "united states" — any combination of upper and lowercase.
A:A. Use specific ranges like A2:A1000 or, even better, convert your data to an Excel Table and use structured references — they expand automatically without scanning empty rows.| for OR matching, (?i) for case-insensitive, and anchors ^...$ when you need an exact cell match.Building Real-World Patterns — Step by Step
You now know all three REGEX functions and the basic building blocks. But when you look at a real-world pattern — like one that validates an email address — it can look like a mess of random characters.
It's not random. Every pattern is just small pieces put together in order. Let me walk you through the most common ones, piece by piece, so you can actually read them and even modify them yourself.
- ^ Start of text. Makes sure nothing comes before the phone number. Without this, "Call 415-555-1234 now" would also match.
- \d{3} Three digits.
\dmeans any digit (0–9).{3}means exactly three. This is the area code — like "415". - - A literal dash. The dash between the area code and the next group. Since
-has no special meaning outside square brackets, no escape needed. - \d{3} Three more digits. The middle group — like "555".
- - Another literal dash.
- \d{4} Four digits. The last group — like "1234".
- $ End of text. Makes sure nothing comes after the phone number. Together with
^, this ensures the entire cell is just a phone number.
^ and $ anchors. Without anchors, the pattern finds the phone number no matter where it sits: =REGEXEXTRACT(A2,"\d{3}-\d{3}-\d{4}")- \( \) Escaped parentheses. In regex,
(and)normally create a capture group. Adding\before them tells regex: "I mean the actual parenthesis character." So\(\d{3}\)matches "(415)". - \s One space. The space between the closing parenthesis and the next three digits.
\smatches any whitespace character.
See how we built on Pattern 1? We just added escaped parentheses and a space. That's the key idea — you're not memorizing patterns, you're combining pieces.
- ^ Start of text. We want to validate the whole cell, not find an email inside longer text.
- [\w\.\-]+ The username part (before the @). The square brackets define a set of allowed characters:
\w= any letter, digit, or underscore.\.= a literal dot.\-= a literal dash. The+means one or more of these. So "puneet", "jane.doe", and "first-last" all match. - @ The @ symbol. Just a literal character. Nothing special in regex — it matches itself.
- [\w\.\-]+ The domain name (after the @, before the last dot). Same set of characters as the username. Matches "excelchamps", "company.co", "my-domain".
- \. A literal dot. The dot before the extension. We escape it with
\because without the escape,.means "any character" in regex. - \w{2,} The extension — two or more word characters. Matches "com", "org", "co", "info". The
{2,}means "at least 2" so single-letter extensions are rejected. - $ End of text. Nothing allowed after the extension.
^ and $ anchors, then use it inside REGEXEXTRACT to pull email addresses out of longer text.- \d{4} Four digits. The year — like "2025".
- [-/] A dash or slash. The character class
[-/]matches either character. This lets the pattern handle both "2025-01-15" and "2025/01/15". - \d{1,2} One or two digits. The month — matches both "1" and "01". Same logic for the day.
Notice this pattern doesn't use anchors, so it can find dates inside longer text. If you want to validate that the entire cell is only a date, add ^ at the start and $ at the end.
=REGEXEXTRACT(A2,"(\d{4})[-/](\d{1,2})[-/](\d{1,2})",2) — this spills the year, month, and day into three separate cells.- [A-Z]{2,4} Two to four uppercase letters. The prefix — like "ABC", "XY", or "PROD".
- - A literal dash. The separator between the letters and numbers.
- \d{3,5} Three to five digits. The numeric part — like "1234" or "99001".
This pattern is case-sensitive by default. "abc-1234" doesn't match because [A-Z] only allows uppercase. To accept both cases, add (?i) at the start of the pattern, or change it to [A-Za-z].
- \$ A literal dollar sign. We escape it because
$normally means "end of text" in regex. The backslash says "I mean the actual $ character." - \d At least one digit. This ensures there's a number after the dollar sign — so "$" alone doesn't match.
- [\d,]* More digits or commas, zero or more times. This handles the optional thousands separator. "1,250" and "50" both work.
- \. A literal dot. The decimal point. Escaped because
.matches anything in regex. - \d{2} Exactly two digits. The cents — like "00" or "99".
How to Build Any Pattern — The 3-Step Process
Every pattern in this section was built the same way. Here's the process I use.
Look at the data and describe the format in plain English. For a phone number: "three digits, a dash, three digits, a dash, four digits." That sentence IS your pattern — you just need to translate each word.
Translate each piece into regex. "Three digits" =
\d{3}. "A dash" =-. "Four digits" =\d{4}. Connect them in order.Decide: validate or extract? If you're validating a whole cell, add
^and$anchors. If you're extracting from longer text, leave them out.
Quick Reference — Common Patterns
Here are patterns for the most common tasks. Copy them directly into your REGEX formulas.
What You're Looking For | Pattern | Notes |
|---|---|---|
Contains any digit | \d | Use with REGEXTEST |
Only digits (whole cell) | ^\d+$ | Anchors ensure full match |
Only letters (whole cell) | ^[A-Za-z]+$ | No digits, spaces, or symbols |
Phone: 415-555-1234 | \d{3}-\d{3}-\d{4} | Add anchors for validation |
Phone: (415) 555-1234 | \(\d{3}\)\s\d{3}-\d{4} | Escape the parentheses |
Email address | [\w\.\-]+@[\w\.\-]+\.\w{2,} | Add anchors for validation |
Date: YYYY-MM-DD | \d{4}[-/]\d{1,2}[-/]\d{1,2} | Handles both - and / |
Dollar amount: $1,250.00 | \$\d[\d,]*\.\d{2} | Escape the $ and . |
ZIP code (US, 5-digit) | ^\d{5}$ | Add (-\d{4})? for ZIP+4 |
Hashtag | #\w+ | Use return_mode 1 for all matches |
URL (http/https) | https?://[\w\.\-/]+ | The ? makes the "s" optional |
Remove extra spaces | \s{2,} | Replace with single space in REGEXREPLACE |
Non-numeric characters | [^0-9] | Replace with "" to keep only digits |
First and Last name | (\w+)\s(\w+) | Use $2, $1 in REGEXREPLACE to swap |
Real-World Scenarios — REGEX Combined with Other Functions
Now you know the patterns and the functions. In this section, we put them together to solve real data problems — the kind of stuff you actually deal with at work.
Each case shows you the scenario, the formula, and how it works step by step.
We'll combine REGEXTEST with IF. REGEXTEST gives us TRUE or FALSE, and IF converts that into a readable label.
Click on cell B2 and enter this formula.
Hit Enter and drag the formula down.
REGEXTEST checks the email pattern (we built this one in the pattern anatomy section). IF takes the TRUE/FALSE and converts it to "Valid" or "Invalid." That's all there is to it.
We'll combine REGEXTEST with FILTER. REGEXTEST tests each phone number, and FILTER uses the results to show only matching rows.
Click on a blank cell where you want the filtered results to appear and enter this formula.
Hit Enter. Excel spills only the rows where the phone number matches the pattern.
REGEXTEST runs against every cell in B2:B8 and creates an array of TRUE/FALSE values. FILTER uses that array and returns only the rows where the result is TRUE. Amit and Ravi got filtered out because their phone numbers don't match the ###-###-#### format.
* for AND and + for OR when combining multiple REGEXTEST conditions inside FILTER. For example: =FILTER(A2:C10, REGEXTEST(B2:B10,"\d{3}-\d{3}-\d{4}") * REGEXTEST(C2:C10,"@")) returns rows where column B has a valid phone AND column C has an @ symbol.This is a two-step job. First, strip everything that's not a digit. Second, insert the dashes in the right places. We'll use LET to keep it readable.
Click on cell B2 and enter this formula.
Hit Enter and drag it down.
Let me walk you through what's happening inside this formula.
LET creates a variable called
digits. It stores the result of the first REGEXREPLACE — which strips all non-digit characters. So "(415) 555-1234" becomes "4155551234".The second REGEXREPLACE uses capture groups. The pattern
(\d{3})(\d{3})(\d{4})splits the 10 digits into three groups: first 3, next 3, last 4.The replacement
"$1-$2-$3"puts dashes between the groups. So "4155551234" becomes "415-555-1234".
digits), so the formula reads like a recipe: first clean, then format.This is where REGEXEXTRACT's return_mode 1 comes in. We've been using the default (mode 0 — first match only). Mode 1 returns all matches.
Enter this formula in the cell next to your text.
Hit Enter. If the cell has multiple emails, they spill into separate cells below.
Notice we removed the ^ and $ anchors from the email pattern. That's because we're not validating — we're extracting from inside longer text. The pattern needs to find matches anywhere in the cell.
=TEXTJOIN(", ",TRUE,REGEXEXTRACT(A2,"[\w\.\-]+@[\w\.\-]+\.\w{2,}",1))This is where REGEXEXTRACT's return_mode 2 comes in. Mode 2 returns the capture groups separately. We put parentheses around each part of the date we want to capture.
Enter this formula in cell B2.
Hit Enter. The year, month, and day spill into B2, C2, and D2 automatically.
The three pairs of parentheses in the pattern create three capture groups. (\d{4}) captures the year. (\d{1,2}) captures the month. The second (\d{1,2}) captures the day. Return_mode 2 tells REGEXEXTRACT to return each captured group separately — and they spill across columns.
=VALUE(REGEXEXTRACT(...)).=CLEANPHONE(A2) and have it just work. LAMBDA lets you turn any formula into a reusable named function. Here's how to set it up.
Go to Formulas Tab ➜ Name Manager ➜ New.
Set the Name to
CLEANPHONE.In the Refers to field, paste this formula.
Click OK to save it.
Now you can use it anywhere in the workbook like this:
And that is the moment of joy. You've built your own custom function — one that cleans any phone number format into a consistent output. Share the workbook, and anyone can use =CLEANPHONE() without knowing how it works inside.
Common Mistakes to Avoid with REGEX in Excel
Regex is powerful, but it has a few traps that catch almost everyone — especially when you're starting out. I've listed the most common ones here. Click on each mistake to see what goes wrong and how to fix it.
This is the #1 beginner mistake. In regex, characters like . ( ) $ * + ? have special meaning. If you want to match the actual character, you need a backslash before it.
The most common one: using . to match a literal dot. Without escaping, . matches any character at all — so "a.b" matches "a1b", "a-b", "axb", everything.
The wrong pattern uses unescaped dots. It might seem to work on most data, but it's technically matching more than it should. The right pattern escapes every dot with \..
In most regex tools and programming languages, you reference capture groups in the replacement using \1, \2, etc. Excel does it differently. Excel uses $1, $2.
If your REGEXREPLACE replacement isn't working, this is almost always the reason. Check for \1 and replace it with $1.
REGEXEXTRACT throws a #N/A error when it can't find a match. If your data has even one cell without a match, you'll see an ugly error in your results.
Always wrap REGEXEXTRACT in IFERROR unless you're 100% sure every cell will have a match. Replace the "" with whatever fallback you want — "None", 0, or even another formula.
When you use REGEXEXTRACT with return_mode 1 (all matches), the results spill into multiple cells. If you drag this formula down for multiple rows, the spill areas overlap each other — and Excel gives you a #SPILL error.
The fix: Wrap the formula in TEXTJOIN to combine all matches into a single cell.
Now all matches go into one cell, separated by commas. No spill, no errors when dragging down.
REGEXEXTRACT always returns text — even when it extracts digits. So "123" comes back as the text string "123", not the number 123. If you try to do math with it, you'll get unexpected results or errors.
Wrap the result in VALUE() before doing any math.
All three REGEX functions and XLOOKUP/XMATCH with match_mode 3 are case-sensitive by default. [A-Z] won't match lowercase letters. "hello" won't match a pattern looking for "Hello".
Two ways to fix it:
Set case_sensitivity to 1
Add (?i) at the start of pattern
Use the (?i) flag for XLOOKUP/XMATCH since they don't have a separate case_sensitivity argument.
Regex matching is more computationally expensive than regular Excel operations. When you use a whole-column reference like A:A, Excel runs the regex engine against over a million cells — most of them empty.
Always use specific ranges. Even better — convert your data to an Excel Table and use structured references. They expand automatically as you add rows without scanning empty cells.
REGEXTEST checks if any part of the text matches. If you want to validate that the entire cell is a valid email (not just that it contains one somewhere), you need anchors.
"Some text [email protected] more text" → TRUE
"Some text [email protected] more text" → FALSE
Add ^ at the start and $ at the end when you're validating. Remove them when you're extracting from inside longer text.
⚠ The Golden Rule
- Escape your dots:
\.not. - Use
$1not\1in replacements - Wrap REGEXEXTRACT in IFERROR
- Use TEXTJOIN with return_mode 1
- VALUE() to convert extracted text to numbers
- Add
(?i)or set case_sensitivity to 1 for case-insensitive matching - Avoid whole-column references — use specific ranges or Tables
- Use
^...$anchors when validating, remove them when extracting
When to Use REGEX vs. Traditional Functions
REGEX doesn't replace everything. For simple, predictable tasks, the old functions — FIND, LEFT, MID, RIGHT, SUBSTITUTE — still work perfectly fine and are easier to read.
But the moment your data gets messy or the pattern gets complex, REGEX pulls ahead. Here's a side-by-side comparison so you can see exactly where each approach wins.
Task | Old Way | REGEX Way | Winner |
|---|---|---|---|
Get first 3 characters | =LEFT(A2,3) | Not needed — LEFT is perfect for this. | Old |
Replace one word with another | =SUBSTITUTE(A2,"old","new") | =REGEXREPLACE(A2,"old","new") | Either |
Check if cell contains a number | =SUMPRODUCT(--ISNUMBER( FIND({0,1,2,3,4,5,6,7,8,9},A2)))>0 | =REGEXTEST(A2,"\d") | Regex |
Extract a phone number from text | =MID(A2,FIND("-",A2)-3,12) Assumes position — breaks easily | =REGEXEXTRACT(A2, "\d{3}-\d{3}-\d{4}") | Regex |
Validate email format | =AND(ISERROR(FIND(" ",A2)), LEN(A2)-LEN(SUBSTITUTE(A2, "@",""))=1, FIND("@",A2)>1, FIND(".",A2,FIND("@",A2)) >FIND("@",A2)+1) 4 conditions, still misses edge cases | =REGEXTEST(A2, "^[\w\.\-]+@[\w\.\-]+\.\w{2,}$") | Regex |
Remove all non-numeric characters | Nested SUBSTITUTE for each character — (, ), -, space, dot… One SUBSTITUTE per character | =REGEXREPLACE(A2,"[^0-9]","") | Regex |
Swap "First Last" → "Last, First" | =MID(A2,FIND(" ",A2)+1,LEN(A2)) &", "&LEFT(A2,FIND(" ",A2)-1) Breaks if there's no space or multiple spaces | =REGEXREPLACE(A2, "(\w+)\s(\w+)","$2, $1") | Regex |
Find text between parentheses | =MID(A2,FIND("(",A2)+1, FIND(")",A2)-FIND("(",A2)-1) Errors if no parentheses exist | =REGEXEXTRACT(A2, "\(([^)]+)\)",2) | Regex |
Replace extra spaces with one space | =TRIM(A2) TRIM handles this well | =REGEXREPLACE(A2,"\s{2,}"," ") | Old |
Look up messy country names (USA / United States) | Multiple XLOOKUP calls combined with IFERROR One lookup per spelling variation | =XLOOKUP("USA|United States", A2:A10,B2:B10,,3) | Regex |
The Bottom Line
Here's how I think about it.
Use traditional functions when the task is simple and the data is predictable. Getting the first 5 characters? LEFT. Replacing one known word? SUBSTITUTE. Trimming spaces? TRIM. These are fast, readable, and everyone understands them.
Use REGEX when the pattern is complex, the data is inconsistent, or you'd need 3+ nested functions to do the same job. Validating formats, extracting from messy text, cleaning multiple character types at once, pattern-based lookups — that's REGEX territory.
Frequently Asked Questions
Microsoft VBScript Regular Expressions 5.5 library — but that's a completely different setup.RE2 as its regex flavor, which doesn't support some advanced PCRE2 features like lookaheads and lookbehinds.case_sensitivity argument to 1 in any of the three REGEX functions. For example, =REGEXTEST(A2,"hello",1) matches "hello", "Hello", and "HELLO". Option 2: Add (?i) at the start of your pattern. For example, =REGEXTEST(A2,"(?i)hello"). Use Option 2 when working with XLOOKUP or XMATCH (match_mode 3), since those functions don't have a separate case_sensitivity argument.VALUE() function: =VALUE(REGEXEXTRACT(A2,"\d+")). This applies to all return modes.=REGEXTEST(A2,"^\d+$") in column B, then use conditional formatting with the formula =B2=TRUE to highlight rows.=TEXTJOIN(", ",TRUE,REGEXEXTRACT(A2,"pattern",1)). You can also wrap the whole thing in IFERROR to handle cells with no matches.=XLOOKUP("USA|United States",A2:A10,B2:B10,"Not found",3) matches either "USA" or "United States" in the lookup range. This is available in Microsoft 365 only.A:A — always use specific ranges like A2:A5000. Convert your data to an Excel Table so structured references expand automatically. And avoid running the same complex regex pattern across dozens of cells if a helper column can do the job once.In the End
That's everything you need to get started with REGEX in Excel. Three functions, REGEXTEST, REGEXEXTRACT, and REGEXREPLACE, and a handful of building blocks that you can combine to handle almost any text problem.
The patterns might look intimidating at first. But once you understand that every pattern is just a description of what the text looks like, character classes, quantifiers, anchors, it clicks. And once it clicks, you'll wonder how you ever lived without it.
Start small. Try REGEXTEST with a simple [0-9] pattern on your own data. Then move to REGEXEXTRACT. Then try a REGEXREPLACE. Build from there. You don't need to memorize anything, that's what the cheat sheet and pattern library in this guide are for.
Now I want to hear from you, which REGEX function are you going to try first? And what kind of messy data are you dealing with? Let me know in the comments below. And please, don't forget to share this guide with your friends, I'm sure they'll appreciate it.