Comparing files and directories with the diff and comm Linux commands



Linux comm command

Robert Couse-Baker

(CC BY 2.0)

If you’ve used the diff command a lot, you probably know that it can also display file content side by side. In the example below, we see that the one line that is different between the two files is marked with a vertical bar preceding the line that is different.

$ diff -y whoison whoison-again
#!/bin/bash                                      #!/bin/bash
# show unique logins                             # show unique loginsecho hello, $USER                                echo hello, $USER
echo Look who is logged in!                      echo Look who is logged in!
echo ===========================                 echo ===========================
who | awk '{print $1}' | sort | uniq           | who | awk '{print $1}' | sort | wc -l
echo ===========================                 echo ===========================

The comm command displays the differences in columns by default, but we’ve got a little problem here:

$ comm whoison whoison-again
                #!/bin/bash
                # show unique logins                echo hello, $USER
                echo Look who is logged in!
                echo ===========================
who | awk '{print $1}' | sort | uniq
comm: file 1 is not in sorted order             <=== Oops! The comm commands expects
echo ===========================                     sorted data
        who | awk '{print $1}' | sort | wc -l
comm: file 2 is not in sorted order
        echo ===========================

The errors indicated in this output confirm one important restriction with diff — it requires that the files being compared are in sorted order.

The output is very different from diff, but let’s review what we’re seeing. In the diff output, we are looking at the lines that are different in the two files. All the other lines in the two files are the same.

In the comm output, we also see the content of both files in columns, but the key is the indentation. The rightmost column displays the content that is the same in both files — up to a point. The other two columns show (leftmost) the content that is unique to the first file and (middle) the content that is unique to the second file. But we also see another line (shown twice) in the first two columns and a couple complaints that the data being compared is not in sorted order. This tells us something about the way that comm works. It expects to be working with files that are in sorted order. You’re better off using diff when you want to compare scripts or other non-sorted data.