Extracting images from an Excel spreadsheet

Have a dream? Start learning your way toward it with courses from $12.99. Shop now for extra savings1px

I needed to extract two diagrams from a worksheet in a Microsoft Excel workbook. The diagrams appeared to have been put in the worksheet as an image through a copy and paste operation. I could right-click on an image in the sheet and choose "Copy" or "Save as Picture" and for the latter option I could choose PNG, JPEG, PDF, GIF, or BMP for the "Save as Type" value, but I wondered what type of file Excel was using for the embedded image. The file was a .xlsm file, which like a .xlsx file is an Office Open XML (OpenXML) file format that can be "unzipped" to reveal the constituent files within it by renaming the file to have a .zip filename extension or copying the file to a new file with a .zip extension - see Zipping and unzipping Excel xlsx files. So I copied the file giving the new file a .zip extension and then extracted the contents of that file by unzipping it. I then had a file named "[Content_Types].xml" and the following directories in the directory where I had extracted files from the zip file:

_rels
customXml
docProps
xl

Beneath the "xl" directory was a "media" directory which contained one .jpeg file, two .png files, and a bunch of .emf files. An Enhanced Metafile (EMF) file is an image file format created by Microsoft Corporation. I was using an Apple Macbook Pro laptop and OS X doesn't provide a default viewing application for that type of file. But I have the free and open-source software (FOSS) application LibreOffice, which comes with a program, soffice, which can convert EMF files to PNG files, so I converted the files using that utility by using command substitution to feed the output of an ls -l *.emf command into that program.

$ /Applications/LibreOffice.app/Contents/MacOS/soffice  --headless --convert-to 
png `ls *.emf`
convert /Users/jasmith1/DocumentsCRQ/xl/media/image3.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image3.png using filter : 
draw_png_Export
convert /Users/jasmith1/DocumentsCRQ/xl/media/image4.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image4.png using filter : 
draw_png_Export
convert /Users/jasmith1/DocumentsCRQ/xl/media/image5.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image5.png using filter : 
draw_png_Export
convert /Users/jasmith1/DocumentsCRQ/xl/media/image6.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image6.png using filter : 
draw_png_Export
convert /Users/jasmith1/DocumentsCRQ/xl/media/image7.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image7.png using filter : 
draw_png_Export
convert /Users/jasmith1/DocumentsCRQ/xl/media/image9.emf -> /Users/jasmith1/Docu
ments/www/CRQ/CRQ000000906773/CRQ-906773 SDP/xl/media/image9.png using filter : 
draw_png_Export
$

Checking the resultant .png files, I found that I had the diagrams I needed, though I found they didn't look exactly the same as they appeared in the Excel workbook, e.g., some background colors were missing from sections of the images, so I resorted to clicking on the images inside the worksheet and choosing "Save as Picture" to save them as PNG files that looked exactly the same as how they appeared in the worksheet.

Related articles:

  1. EMF image embedded in a PowerPoint file on OS X
  2. Zipping and unzipping Excel xlsx files
  3. How to get cat to process a file name provided in the output of another command