MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
February
Sun Mon Tue Wed Thu Fri Sat
       
10
     
2018
Months
Feb


Sat, Feb 10, 2018 10:55 pm

Extracting the contents of a directory in a zipfile using Python

A Microsoft Excel file with an .xlsx or .xlsm filename extension is an Office Open XML (OpenXML) zipped, XML-based file. The OpenXML format was developed by Microsoft for spreadsheets, charts, presentations and word processing documents. If you change the file extension to .zip by renaming the file, you can extract the contents of the zip file as you would with any other zip file - see Zipping and unzipping Excel xlsx files. Excel workbooks can contain other documents embedded within them using Object Linking and Embedding (OLE) technology - see Using olefile to obtain metadata from an OLE CDF V2 file. I often need to extract an embedded PowerPoint slide or Visio diagram from Excel .xlsm files, so I've been renaming the files to zip files and unzipping them as I would other zip files, but, since I want to automate the process and extract just specific embedded files for further processing within a Python script, I created the script below to extract the embedded files, which are contained within a xl/embeddings subdirectory within the .xlsm zip files. The script uses the zipfile module to deal with the zip files. Python's OS module is used to check for the existence of the destination directory and create it, if it doesn't yet exist.

#!/usr/bin/python

import os, zipfile

dirToExtract = "xl/embeddings/"
destinationDir = "embedded"
infile = raw_input("Enter zipfile: ")
archive = zipfile.ZipFile(infile)

if not os.path.exists(destinationDir):
    os.makedirs(destinationDir)

for file in archive.namelist():
    if file.startswith(dirToExtract):
        archive.extract(file, destinationDir)

The script prompts for the file to be unzipped and then extracts just the "xl/embeddings" folder and the files contained within it to a new directory it will create within the current working directory. The new directory will be named "embedded". After extracting the contents of the "xl/embeddings" directory to the newly created "embedded" folder, I had the the files below in the case of the particular .xlsm file I used for this example.

[ More Info ]

[/languages/python/excel] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo