Using xlrd to extract a column from an Excel spreadsheet

I receive an Excel spreadsheet every month containing a membership list for an association. I extract the column containing the email addresses, check all of the addresses to ensure that they are valid email addresses, i.e., they contain an "@" and a ".", then update a mailing list file that is used for sending the monthly newsletter by email to the association's members. I wanted to further automate the process by having a script extract the data in the column containing the email addresses from the worksheet that has names, email addresses, and physical addresses. So I decided to try xlrd, which is a package for reading data and formatting information from Excel files, for that purpose. I downloaded xlrd 0.9.3 and the documentation provided from Working with Excel Files in Python. I then extracted the files from the .tgz file I downloaded and intalled xlrd on my MacBook Pro laptop.

$ tar xzf xlrd-0.9.3.tar.gz 
$ cd xlrd-0.9.3
$ python setup.py install
running install
running build
running build_py
creating build
creating build/lib
creating build/lib/xlrd
copying xlrd/__init__.py -> build/lib/xlrd
copying xlrd/biffh.py -> build/lib/xlrd
copying xlrd/book.py -> build/lib/xlrd
copying xlrd/compdoc.py -> build/lib/xlrd
copying xlrd/formatting.py -> build/lib/xlrd
copying xlrd/formula.py -> build/lib/xlrd
copying xlrd/info.py -> build/lib/xlrd
copying xlrd/licences.py -> build/lib/xlrd
copying xlrd/sheet.py -> build/lib/xlrd
copying xlrd/timemachine.py -> build/lib/xlrd
copying xlrd/xldate.py -> build/lib/xlrd
copying xlrd/xlsx.py -> build/lib/xlrd
running build_scripts
creating build/scripts-2.7
copying and adjusting scripts/runxlrd.py -> build/scripts-2.7
changing mode of build/scripts-2.7/runxlrd.py from 644 to 755
running install_lib
creating /Library/Python/2.7/site-packages/xlrd
error: could not create '/Library/Python/2.7/site-packages/xlrd': Permission denied

When I saw the "Permission denied" message, I realized I needed to either use sudo python setup.py install to obtain root level access to put files beneath the /Library directory or install using python setup.py install --user, which would install it to a directory owned by the account under which I was running the command, i.e., a user account rather than the root account. I chose the latter option - see the Stack Overflow posting on error: could not create '/Library/Python/2.7/site-packages/xlrd': Permission denied.

When I reran the installation, I then saw the following:

$ python setup.py install --user
running install
running build
running build_py
running build_scripts
running install_lib
creating /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/__init__.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/biffh.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/book.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/compdoc.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/formatting.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/formula.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/info.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/licences.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/sheet.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/timemachine.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/xldate.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
copying build/lib/xlrd/xlsx.py -> /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/__init__.py to __init__.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/biffh.py to biffh.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/book.py to book.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/compdoc.py to compdoc.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/formatting.py to formatting.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/formula.py to formula.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/info.py to info.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/licences.py to licences.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/sheet.py to sheet.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/timemachine.py to timemachine.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/xldate.py to xldate.pyc
byte-compiling /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd/xlsx.py to xlsx.pyc
running install_scripts
creating /Users/jdoe/Library/Python/2.7/bin
copying build/scripts-2.7/runxlrd.py -> /Users/jdoe/Library/Python/2.7/bin
changing mode of /Users/jdoe/Library/Python/2.7/bin/runxlrd.py to 755
running install_egg_info
Writing /Users/jdoe/Library/Python/2.7/lib/python/site-packages/xlrd-0.9.3-py2.7.egg-info

The Excel workbook contains only one worksheet, which is named AddressData. Column H in the worksheet contains the email addresses. The other columns contain name and physical address information. So I created a python script extract-addresses.py to extract the email addresses from the Excel spreadsheet and write them to a text file with one email address per line in the text file. The script is shown below:

# Extract email addresses from column H of Excel spreadsheet

from xlrd import open_workbook,cellname

book = open_workbook('Directory.xls')
email_file = open('Email.txt', 'w')
sheet = book.sheet_by_index(0)
print "Worksheet name: ",sheet.name
print "Number of entries: ",sheet.nrows

# Email addresses are in the 8th column, column H, but 1st column is column 0
col_index = 7
for row_index in range(sheet.nrows):
     email_file.write(sheet.cell(row_index,col_index).value + "\n")
email_file.close()

When the script is run it produces a file named Email.txt and also writes the worksheet name and the number of current entries in the worksheet to the terminal window:

$ python extract-addresses.py
Worksheet name:  AddressData
Number of entries:  2519

I then use a list-update Bash script to check all of the lines in the file produced by the python script for any email addresses that are obviously invalid due to a missing "@" or "." and remove any duplicate email addresses. I then use the resulting file for the email distribution list.

Download Files

extract-addresses.py
xlrd-0.9.3.tar.gz
Working with Excel Files in Python (PDF) - Written by Chris Withers with help from John Machin

Learn Python, it's CAKE1x1 px

Valid HTML 4.01 Transitional

Created: Tuesday January 27, 2015