Pulling Apart a URL

Jordan has a need to break apart a URL into its components. If the full URL (http://www.xyz.com/business) is in column A, he would like a way to have the main domain (xyz or xyz.com) in column B and the path (business) in column C.

There are multiple ways you can go about solving this issue. If you only need to do this a time or two, it might be easiest to use the Text to Columns tool, available on the Data tab of the ribbon. When you specify how you want the tool to pull the strings apart, indicate that you want it to use the slash (/) as a delimiter. Depending on the way your original URLs are put together, this may require a bit of “cleanup” after the tool is done, but it is quick and easy.

If you do this more often, then you might benefit from using a formula to do the manipulation of the URL. The first task is to pull the domain from the URL, but this is a bit tricker than it sounds. For instance, the URL may start with any number of protocol specifiers (such as http://, https://, ftp://, etc.) or it may not begin with a protocol specifier at all. Plus, you may want to strip off the “www.” nomenclature, as some people consider it extraneous. To figure out where the “real” domain starts, you could use this formula in column B:

=IFERROR(FIND("www.",$A1)+4,IFERROR(FIND("://",$A1,1)+3,1))

What it does it to return the position of the first character after “www.”, if it is available. If it is not present, then it returns the position of the first character after “://”, if it is present. If that is not there, then the number 1 is returned, since the URL doesn’t contain any protocol prefix or www.

With that value in column B, you can then start to actually pull out the parts you want. The following, placed, in column C, will return the domain, as already described. It essentially returns everything from the character position shown in column B up to the next slash (/).

=MID(A1,B1,IFERROR(FIND("/",A1,B1)-B1,LEN(A1))

If the original url included “www.xyz.com,” then “xyz.com” is returned by the formula. If it contained a different subdomain other than “www” (such as “research.xyz.com”), then the full domain including the subdomain is returned. The IFERROR function is included in case there is no slash (/) after the domain name. (The FIND function returns a #VALUE error if it cannot find the item for which it is looking.)

To pick off the path after the domain, you can use the following formula in column D:

=IFERROR(MID(A1,FIND("/",A1,B1)+1,LEN(A1)),"")

The formula returns everything after the first slash (/) occurring after the domain name. If there is no slash after the domain name, then it returns nothing. (This is, again, thanks to the use of the IFERROR function.)

The key to using the above formulas, again, is the helper column in column B. If you wanted to do away with the need for column B, you would need to replace in the other two formulas all instances of B1 with the formula (without the equal sign) that belongs in B1. That would, of course, make the other two formulas quite long, especiall the one noted for column C.

You can, if desired, create a macro that essentially does the same thing, without the need for a helper column. The following example expect you to make a selection and then pulls the parts of the URL out of the cells and places them into the two columns to the right of the selection.

Sub GetURLParts()
    Dim c As Range
    Dim sRaw As String
    Dim J As Integer
    
    For Each c In Selection
        sRaw = c.Text
        J = InStr(sRaw, "://")
        If J > 0 Then sRaw = Mid(sRaw, J + 3)
        If LCase(Left(sRaw, 4)) = "www." Then
            sRaw = Mid(sRaw, 5)
        End If
        J = InStr(sRaw, "/")
        If J > 0 Then
            c.Offset(0, 1) = Left(sRaw, J - 1)
            c.Offset(0, 2) = Mid(sRaw, J + 1)
        Else
            c.Offset(0, 1) = sRaw
            c.Offset(0, 2) = ""
        End If
    Next c
End Sub