A Microsoft Excel file with an .xlsx or .xlsm
filename extension is an
Office
Open XML (OpenXML) zipped, XML-based file. The OpenXML format was developed by Microsoft for
spreadsheets, charts, presentations and word processing documents. If you
change the file extension to .zip by renaming the file, you can
extract the contents of the zip file as you would with any other
zip file - see Zipping
and unzipping Excel xlsx files. Excel workbooks can contain other documents
embedded within them using
Object Linking and Embedding (OLE) technology - see
Using olefile to
obtain metadata from an OLE CDF V2 file. I often need to extract an
embedded PowerPoint slide or
Visio
diagram from Excel .xlsm files, so I've been renaming the files to
zip files and unzipping them as I would other zip files, but, since
I want to automate the process and extract just specific embedded
files for further processing within a Python script, I created the
script below to extract the embedded files, which are contained
within a xl/embeddings
subdirectory within the .xlsm
zip files. The script uses the
zipfile module
to deal with the zip files.
Python's
OS module is used to check for the existence of the destination
directory and create it, if it doesn't yet exist.
#!/usr/bin/python import os, zipfile dirToExtract = "xl/embeddings/" destinationDir = "embedded" infile = raw_input("Enter zipfile: ") archive = zipfile.ZipFile(infile) if not os.path.exists(destinationDir): os.makedirs(destinationDir) for file in archive.namelist(): if file.startswith(dirToExtract): archive.extract(file, destinationDir)
The script prompts for the file to be unzipped and then extracts just the "xl/embeddings" folder and the files contained within it to a new directory it will create within the current working directory. The new directory will be named "embedded". After extracting the contents of the "xl/embeddings" directory to the newly created "embedded" folder, I had the the files below in the case of the particular .xlsm file I used for this example.
[ More Info ]