xlrd and hidden worksheets
I use the xlrd
module in Python scripts to extract data from Excel workbooks. You can
use the Python xlrd module to
list the worksheets in a workbook and you can use the
xlrd.sheet "visibility" value to determine whether a sheet is hidden and,
if it is hidden, whether a user can unhide the sheet. The value should be
either 0, 1, or 2 with the numbers having the following meaning:
Visibility of the sheet:
0 = visible
1 = hidden (can be unhidden by user -- Format -> Sheet -> Unhide)
2 = "very hidden" (can be unhidden only by VBA macro)
[ More Info ]
[/languages/python/excel]
permanent link
Extracting embedded Microsoft Office files from an Excel spreadsheet
I work with Excel workbooks on my
MacBook Pro
laptop that have embedded
PowerPoint
slides on some worksheets. The workbooks, which I need to review, are
created by others. When I review them, I
extract information from
the Excel workbooks to an SQLite database with Python and also have begun
extracting information embedded by
Object Linking and Embedding (OLE) into files as
noted in Extracting the
contents of a directory in a zipfile using Python. Some of the
embedded files are PowerPoint files, but when they are extracted they
have a .bin extension, which I can't open in
PowerPoint without changing the
filename extension from .bin to .ppt. To automate the renaming process,
I created a Python script, extract_embedded.py
that will extract
the embedded information to files in an "embedded" directory beneath the
current working directory and then rename any .bin files that are PowerPoint
files to have a .ppt extension. The script is shown below.
[ More Info ]
[/languages/python/excel]
permanent link
Extracting the contents of a directory in a zipfile using Python
A Microsoft Excel file with an .xlsx or .xlsm
filename extension is an
Office
Open XML (OpenXML) zipped, XML-based file. The OpenXML format was developed by Microsoft for
spreadsheets, charts, presentations and word processing documents. If you
change the file extension to .zip by renaming the file, you can
extract the contents of the zip file as you would with any other
zip file - see Zipping
and unzipping Excel xlsx files. Excel workbooks can contain other documents
embedded within them using
Object Linking and Embedding (OLE) technology - see
Using olefile to
obtain metadata from an OLE CDF V2 file. I often need to extract an
embedded PowerPoint slide or
Visio
diagram from Excel .xlsm files, so I've been renaming the files to
zip files and unzipping them as I would other zip files, but, since
I want to automate the process and extract just specific embedded
files for further processing within a Python script, I created the
script below to extract the embedded files, which are contained
within a xl/embeddings
subdirectory within the .xlsm
zip files. The script uses the
zipfile module
to deal with the zip files.
Python's
OS module is used to check for the existence of the destination
directory and create it, if it doesn't yet exist.
#!/usr/bin/python
import os, zipfile
dirToExtract = "xl/embeddings/"
destinationDir = "embedded"
infile = raw_input("Enter zipfile: ")
archive = zipfile.ZipFile(infile)
if not os.path.exists(destinationDir):
os.makedirs(destinationDir)
for file in archive.namelist():
if file.startswith(dirToExtract):
archive.extract(file, destinationDir)
The script prompts for the file to be unzipped and then extracts just
the "xl/embeddings" folder and the files contained within it to a new
directory it will create within the current working directory. The new
directory will be named "embedded". After extracting the contents of the
"xl/embeddings" directory to the newly created "embedded" folder, I had the
the files below in the case of the particular .xlsm file I used for this
example.
[ More Info ]
[/languages/python/excel]
permanent link
Using the Python xlrd module to list the worksheets in a workbook
To view the list of sheets in an
Excel
spreadsheet, I can use the
xlrd module within the
Python script below to obtain the list of
worksheets within the workbook.
#!/usr/bin/python
import xlrd as xl
file_name = raw_input("File: ")
workbook = xl.open_workbook(file_name)
print workbook.sheet_names()
If I use the script to display the list of worksheets in a workbook
named report.xlsx
that has three sheets named alpha,
beta, and gamma, I would see the following output:
$ ./sheetlist.py
File: report.xlsx
[u'alpha', u'beta', u'gamma']
$
[ More Info ]
[/languages/python/excel]
permanent link
Using Python to extract data from a spreadsheet and put it in a database
I need to review spreadsheets related to work requests on a daily basis. The
Excel
workbooks have multiple worksheets within them. I have been manually copying
data from specific cells in one of the worksheets and pasting it into a
SQLite
database. I use DB Browser for SQLite
to paste the information into the database. I wanted to automate the
process of extracting the data from the .xlsm or .xlsx workbook files and
inserting it into the SQLite database using a
Python script. So I initially created a script that I could
run from a Teminal window on my
MacBook Pro laptop,
as noted in
Extracting data from cells in an Excel workbook with Python, that
would just print the information to the Terminal window's command-line
interface. I had another script to
import data from
a text file into an SQLite database with Python, so I combined code from
that script with the one to read data from the spreadsheets to add the data
to a table in the database.
[ More Info ]
[/languages/python/excel]
permanent link
Extracting data from cells in an Excel workbook with Python
I've been
using xlrd to extract a column from an Excel spreadsheet with a
Python script on my MacBook Pro laptop. With xlrd, you can
read data from cells in a workbook by specifying a specific sheet and the cells
from which you wish to extract the data in
spreadsheet
programs, such as
Microsoft Excel,
Apache OpenOffice Calc, or
LibreOffice Cale.
With spreadsheet applications you normally have rows sequenced by numbers
and columns sequenced by alphabetical characters. With xlrd, though, both
rows and columns are specified by numbers with the first row starting at 0
and the first column starting at 0 as well.
[ More Info ]
[/languages/python/excel]
permanent link
Python script to extract a column from an Excel spreadsheet
I
use the Python xlrd module
to extract a column from an Excel spreadsheet. I've been using a
Python script,
extract-addresses.py to pull the data from a set column in an .xls
spreadsheet, Directory.xls, and store it in a text file. I decided to
make the script more generic, so that I can extract the data from a
spreadsheet I specify when I run the script rather than a set spreadsheet and
to have the specific column and output file name be variable as well. The
new script is
extract-column.py. The Python
xlrd module will need to be
installed for the script to work; xlrd can be downloaded from one of the links
below:
Python Software Foundation
MoonPoint
Support (version 1.0.0)
If no arguments are provided on the command line, the script will prompt
for the name of the spreadsheet file from which the data should be extracted,
the column number containing the data to be extracted (A is treated as
column 1, B as column 2, etc.), and the name of the output file to hold
the extracted data.
[ More Info ]
[/languages/python/excel]
permanent link