Extracting a Pattern from within Text

Tom has a worksheet that contains about 20,000 cells full of textual data. From within these cells he needs to extract a specific pattern of text. The pattern is ##-##### where each # is a digit. This pattern does not appear at a set place in each cell. Tom wonders if there is a way to extract the desired information.

There are several ways that you can approach this problem, and the correct solution for your needs will depend on the characteristics of the data with which you are working. If you know that the only place in your data that you will have a dash is within your pattern, then you can key off of the presence of the dash by using a formula such as the following:

=MID(A1,FIND("-",A1)-2,8)

This finds the dash and then grabs the eight characters beginning two characters to the left of the dash. This obviously will not work if there are dashes in other places in the text or if it is possible to have “patterns” that include non-digits (such as 12-34B32) and you want those excluded. In that case you’ll need a much more complex formula:

=IF(ISERROR(INT(MID(A1, FIND("-", A1, 1)-2, 2)) & INT(MID(
A1, FIND("-", A1, 1)+1, 5))), "", MID(A1, FIND("-", A1)-2, 8))

This includes an error checking component that finds out if the characters just before the dash and just after the dash contain anything other than digits. If they do, then nothing is returned.

The one thing that these formulaic approaches don’t do is handle those situations where there may be more than one occurrence of the pattern within the same cell. In that case, a macro is the best approach. The following will extract the valid patterns and place them in a new worksheet called “Results”.

Sub ExtractPattern()
    On Error Resume Next
    Set SourceSheet = ActiveSheet
    Set TargetSheet = ActiveWorkbook.Sheets("Results")
    If Err = 0 Then
        Worksheets("Results").Delete
    End If
    Worksheets.Add
    ActiveSheet.Name = "Results"
    Set TargetSheet = ActiveSheet
    Cells(1, 1).Value = "Found Codes"
    Cells(1, 1).Font.Bold = True
    iTargetRow = 2

    SourceSheet.Select
    Selection.SpecialCells(xlCellTypeLastCell).Select
    Range(Selection, Cells(1)).Select

    For Each c In Selection.Cells
        If c.Value Like "*##-#####*" Then
            sRaw = c.Value
            iPos = InStr(sRaw, "-")
            Do While iPos > 0
                If iPos < 3 Then
                    sRaw = "  " & sRaw
                    iPos = iPos + 2
                End If
                sTemp = Mid(sRaw, iPos - 2, 8)
                sRaw = Mid(sRaw, iPos + 6, Len(sRaw))
                If sTemp Like "##-#####" Then
                    TargetSheet.Cells(iTargetRow, 1) = sTemp
                    iTargetRow = iTargetRow + 1
                Else
                    sRaw = Mid(sTemp, 4, 5) & sRaw
                End If
                iPos = InStr(sRaw, "-")
            Loop
        End If
    Next c
End Sub

Note that the macro uses the Like function in two places. The first instance determines if the pattern occurs anywhere in the cell, and the second instance is used to determine if the extracted characters exactly match the desired pattern.