I needed to compare two files on a CentOS Linux system to find the lines in one file that didn't appear in the other. I.e., I had a file bounced.txt with a list of email addresses that had experienced bounced messages. Some, but not all of those email address were part of a mailing list stored at /etc/mail/mailinglist.txt. I wanted to see only those lines in bounced.txt that did not appear in mailinglist.txt. The comm utility, which is also present on Mac OS X systems, allows you to compare two files and determine which lines occur in one but not another file.
I was able to find the lines that appeared in bounced.txt, but not mailinglist.txt with the following comm command:
# comm <(sort /etc/mail/mailinglist.txt) <(sort bounced.txt) -13 bounce-600404@bounce.getaresponse.com jasmith@example.com
You need to provide comm with sorted files for it to do its matching,
which is why I used the sort
command to sort the files before providing the contents of the two files
to the comm command. I included the -13
because normally comm
produces three columns of output as explained below in information from
the comm man page:
NAME comm - compare two sorted files line by line SYNOPSIS comm [OPTION]... FILE1 FILE2 DESCRIPTION Compare sorted files FILE1 and FILE2 line by line. With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files)