How to Extract Substring in Google Sheets

- Written by Puneet

As a warehouse manager, you’re responsible for a spreadsheet filled with product details in one column. Each entry, a unique combination of Product ID, Product Name, and Batch Number, is separated by dashes, like “101-Alpha Widget-B001”.

To enhance your inventory management, you need to organize this information better. For this, you need to separate each entry into three parts: the Product ID, the Product Name, and the Batch Number.

extract-substring-in-google-sheets

In Google Sheets, there are N number of ways that you can use to extract data from one string to multiple substrings. In this tutorial, we will learn all these ways in detail so that you can handle any situation where you need to extract data. So, let’s get started…

Using the SPLIT Function to Extract All the Substrings

SPLIT is like having a smart pair of scissors for your text data. It helps you cut a long string of text into smaller substrings based on a specific character or separator you specify. SPLIT has a simple syntax: SPLIT(text, delimiter, [split_by_each], [remove_empty_text]).

In our example, we need to use the first two arguments:

=SPLIT(A2,"-")

Here, it splits the text in cell A2 into separate substrings using the hyphen (“-“) as the delimiter.

split-function-to-extract-all-substrings

The best part about using SPLIT is that it automatically places the split text into separate columns and eliminates the need for manual cell references and multiple formulas. However, it is only useful when you have a consistent delimiter across the entire string.

Note – Before writing formulas, understand your data’s consistent patterns or delimiters. This will help you choose the right function and pattern for extraction.

Use the LEFT Function to Get the Substring from the Starting

The LEFT function is quick to use when extracting a specific number of characters from the beginning of a text string.

=LEFT(text, [num_chars])
  • text: This is the string (enter directly in the function or a cell reference) from which you want to extract the Substring.
  • num_chars: This argument is optional and specifies the number of characters you want to extract from the start of the text. If you omit this, the default is 1.

Now, let’s say you want to extract the product ID from the data; in this case, you can write a formula like the following:

=LEFT(A2,3)
left-function-to-get-substring-from-starting

In the formula we used, we extracted the first three characters, the Product ID, from the text in cell A2. In this case, A2 contains “101-Alpha Widget-B001”.

When you use this formula, you know many characters of the Substring you need to extract from the main string. But there might be a situation where you don’t know the number of characters in the Product ID. In this case, you need to use a formula like the following:

=LEFT(A2, SEARCH("-", A2) - 1)
characters-of-substring-to-extract

You can use a combination of LEFT and SEARCH to dynamically find the position of a specific character and extract the Substring from the text string.

In this formula, the first part finds the position of the first dash in the text. For “101-Alpha Widget-B001”, it returns 4 (the position of the dash).

Then, LEFT(A2, SEARCH(“-“, A2) – 1) takes the text in A2 and extracts characters up to the position of the dash minus one. So, it extracts the first three characters, “101”.

Read this – How to Separate Names in Google Sheets

Use the RIGHT Function to Get the Substring from the Ending

In the same way, let’s say you need to get the Substring from the right side of the test string. In our example, from the list of product details like “101-Alpha Widget-B001” in cell A2, you want to extract the batch number, which is the last part of the string after the final hyphen (-).

The right is just like the left, where you need to specify the characters you want to extract from the main string.

=RIGHT(text, [num_chars])
=RIGHT(A2, 4)
right-function-to-get-substring-from-ending

If the length of the batch number is consistent throughout all the values, you can directly use the RIGHT with the fixed num_chars. But if you don’t know the length of the batch number, then you need a more dynamic approach.

=RIGHT(A2, LEN(A2) - SEARCH("-", A2, SEARCH("-", A2) + 1))
right-function-with-dynamic-approach

To understand this formula, you need to split it into parts.

  • LEN(A2): This returns the total length of the text in cell A2.
  • SEARCH(“-“, A2, SEARCH(“-“, A2) + 1): This returns the position of the second dash in the text. For “101-Alpha Widget-B001”, it returns 16.
  • LEN(A2) – SEARCH(“-“, A2, SEARCH(“-“, A2) + 1): This calculates the number of characters from the second dash to the end of the string.

So, in the end, the formula extracts the batch number “B001” from “101-Alpha Widget-B001”.

Read – How to Capitalize the First Letter in Google Sheets

Extract the Substring from the Middle of the String

You can use the MID function to extract a substring from the middle of a text string in Google Sheets. This allows you to specify the starting point and the number of characters you want to extract from a given text string.

Let’s say the same example where you have product details like “101-Alpha Widget-B001” in cell A2 and want to extract the product name between the first and second dash.

extract-substring-from-the-middle

In our example, you can see we don’t have a consistent count and width of the product name, and that’s why you need to write a dynamic formula that can help you get the Substring from the middle of the string without defining the count.

And for this, you need to combine SEARCH with MID. Below is the formula that you can use:

=MID(A2, SEARCH("-", A2) + 1, SEARCH("-", A2, SEARCH("-", A2) + 1) - SEARCH("-", A2) - 1)
mid-function-in-dynamic-way

And to understand this formula, you need to split it into parts:

  • SEARCH(“-,” A2): Find the position of the first dash (“-“) in the text.
  • SEARCH(“-“, A2) + 1: Calculates the position right after the first dash.
  • SEARCH(“-“, A2, SEARCH(“-“, A2) + 1): Find the position of the second dash (“-“) in the text. It tells the function to start searching after the first dash.
  • SEARCH(“-“, A2, SEARCH(“-“, A2) + 1) – SEARCH(“-“, A2) – 1: Calculates the number of characters between the first and second dashes.

The MID extracts 11 characters starting from the 5th position in the string “101-Alpha Widget-B001”, which is Alpha Widget. As we are calculating all this dynamically, when you drag the formula down to the cells, it also gets your product name from it.

Note – You can use IFERROR to deal with errors while extracting substrings or write a formula to check if a cell contains the text you want to extract.

REGEXEXTRACT to Extract Substring from any Position

REGEXEXTRACT is like an intelligent search function that helps you find and extract specific parts of text strings on patterns. From the list of product details in the format “101-Alpha Widget-B001”, you want to get out just the product name, which is between the dashes. You use a formula like:

extract-substring-from-any-position
=REGEXEXTRACT(A2, "-(.*?)-")

In this formula, when you specify A2 to get the Substring from the cell using the -(.*?)- REGEX.

  • “-“: The first dash in the pattern tells the formula to start looking right after the first dash in the text.
  • “(.*?)-“: This part says the formula to “Find any characters that come after the first dash and stop when you hit the second dash”. The .*? means “any characters, as few as possible,” which helps it capture just “Alpha Widget” and not the rest of the string.

To extract the Product ID using REGEXEXTRACT, you must define a pattern that identifies the first portion of the text you want to extract. You have a string like “101-Alpha Widget-B001” in cell A2, and you want to extract the first part, “101,” which is the Product ID.

regexextract-to-extract-substring
=REGEXEXTRACT(A2, "^[^-]+")

In this formula, when you use ^[^-]+

  • ^: The caret symbol ^ means the start of the string. It tells the function to begin the search from the beginning of the text in the cell.
  • [^-]: The square brackets [] represent a character class. A character class matches any one character inside the brackets. The caret ^ inside the square brackets has a different meaning than when it’s outside. It means “not” or “none of these characters” in the brackets. That means no character between hyphens (-).
  • +: The plus + means “one or more” of the preceding elements. In this case, it means one or more characters that are not a hyphen.

When you use it in combination, it gets the Substring from the start of the string, and it will keep matching characters as long as they are not hyphens, stopping as soon as it encounters the first hyphen. Now, get the batch number from the cell. You can use REGEXEXTRACT for this as well.

understand-regexextract-formula
=REGEXEXTRACT(A2, "[^\\-]+$")

The formula looks at the text in cell A2 and extracts everything from the last dash to the end of the string. So, if you apply this to your data, it will quickly give you the later part of each product detail.

To understand the regular expression, you need to split it into parts:

  • [ and ]: These square brackets define a character class that matches any character inside the brackets.
  • ^: When used inside square brackets, the caret symbol ^ negates the character class. This means “not” or “none of these characters”.
  • \\-: The backslash \ is an escape character. Because the dash – is a special character inside character classes, it needs to be escaped to be treated as a literal dash. In Google Sheets, you need to escape the backslash itself so it becomes \\-.

So [^\\-] this part of the pattern matches any character that is not a dash.

  • +: The plus symbol + means “one or more” of the preceding elements. In this case, it means one or more characters that are not a dash.
  • $: The dollar symbol $ signifies the end of the string. It ensures that the pattern matches the last part of the text up to the end of the string.

So, this helps you extract the last part of a string after the final hyphen.

REGEXREPLACE to Extract Substring by Replacing Strings

To extract the product name from “101-Alpha Widget-B001”, use REGEXREPLACE twice. The first REGEXREPLACE removes everything up to the first dash, and the second REGEXREPLACE removes everything after the product name up to the end.

This way, you’re left with the “Alpha Widget”.

=REGEXREPLACE(REGEXREPLACE(A2, "^[^-]+-", ""), "-.*$", "")
extract-substring-by-replacing-strings

First, we have used =REGEXREPLACE(A2, “^[^-]+-“, “”) to remove everything up to and including the first dash. Here, ^[^-]+- matches all characters from the start of the string up to the first dash.

The ^ means the start, [^-]+ matches one or more characters that are not dashes, and – is the dash itself. This formula transforms “101-Alpha Widget-B001” into “Alpha Widget-B001”.

Next, we used =REGEXREPLACE(A2, “-.*$”, “”) to the result to remove everything after the product name, starting from the first dash after it. In this pattern, -.*$ matches the first dash and all characters that follow it up to the end of the string.

The – means the dash, .* matches zero or more of any character, and $ signifies the end of the string. This leaves you with “Alpha Widget” from “Alpha Widget-B001”.

You can also use it to get the Substring from the start of the text.

=REGEXREPLACE(A2, "-.*$", "")
regexreplace-to-get-substring-from-start-of-text

In this formula, the regular expression “-.*$” matches the first dash and everything that comes after it. Specifically, the – matches the first dash, .* matches zero or more of any character (everything after the dash), and $ indicates the end of the string. The “” part of the formula replaces the matched portion with an empty string, effectively removing it.

In the end, we can use it to get the batch number from the string from the A2.

=REGEXREPLACE(A2, "^.*-", "")
regexreplace-to-get-batch-number

The regular expression “^.*-” matches everything from the start of the string up to the last dash. The ^ symbol represents the start of the string, .* matches any character zero or more times, and – matches the dash. By replacing this matched portion with an empty string (“”), you remove everything before “B001”.