Extracting embedded documents from an Excel .xlsm file

I often receive Microsoft Excel files that have documents created by other Microsoft applications embedded within them. E.g., at the top of a worksheet I may see something like =EMBED("Visio.Drawing.11","").

EMBED Visio.Drawing

Sometimes I want to extract the embedded file. With a Microsoft Excel .xlsm file that is easy to do, because XLSM is a zipped, XML-based file format. To extract embedded documents, such as Visio drawings or PowerPoint presentations, I make a copy of the .xlsm file then rename the copy's extension from .xlsm to .zip. I can then extract the contents of the zip file. Within the directory that holds the extracted files, there will be a xl directory. Within that directory there is an embeddings directory that holds the embedded files, such as the Visio drawings in the example below.

$ ls ~/Documents/Work/CRQ/843940/unzipped
[Content_Types].xml	customXml		xl
_rels			docProps
$ ls ~/Documents/Work/CRQ/843940/unzipped/xl
_rels			comments19.xml		comments9.xml
calcChain.xml		comments2.xml		ctrlProps
charts			comments20.xml		drawings
comments1.xml		comments21.xml		embeddings
comments10.xml		comments22.xml		media
comments11.xml		comments23.xml		printerSettings
comments12.xml		comments24.xml		sharedStrings.xml
comments13.xml		comments3.xml		styles.xml
comments14.xml		comments4.xml		theme
comments15.xml		comments5.xml		vbaProject.bin
comments16.xml		comments6.xml		workbook.xml
comments17.xml		comments7.xml		worksheets
comments18.xml		comments8.xml
$ ls ~/Documents/Work/CRQ/843940/unzipped/xl/embeddings
Microsoft_Visio_2003-2010_Drawing111.vsd
Microsoft_Visio_2003-2010_Drawing222.vsd
Microsoft_Visio_2003-2010_Drawing333.vsd
Microsoft_Visio_2003-2010_Drawing444.vsd
oleObject1.bin
oleObject2.bin
oleObject3.bin
oleObject4.bin
$

I can then open the extracted documents in the application used to create them. In the case of Visio drawings, since Microsoft doesn't provided a Visio viewer program for Mac OS X systems, when I extract them on my MacBook Pro laptop, I use VSD Viewer Pro to view the files.

But with PowerPoint slides embedded in the .xlsm file, I may see =EMBED("PowerPoint.Slide.8","") in the function field at the top of a worksheet, but when I rename the .xlsm file to a .zip file and extract the contents of the zip file, I may see something like the following for the files listed in the embeddings directory:

$ ls ~/Documents/Work/CRQ/833131/unzipped/xl/embeddings
oleObject1.bin	oleObject2.bin	oleObject3.bin	oleObject4.bin
$

Since I know that there was an embedded PowerPoint slide in the Excel workbook, but can't open the bin files in PowerPoint, I can rename them to .ppt and then open them with PowerPoint. In the case of the example above, I renamed all four .bin files to have a .ppt extension, instead. I was able to open the first three in PowerPoint and copy the text from the slides that I wanted to put into another application - I couldn't copy the text inside of Excel from the embedded slides. For the fourth one, I saw the message "PowerPoint cannot open the type of file represented by oleObject4.ppt." But the other files contained the information I wanted. The "ole" in the file names stands for Object Linking and Embedding", which is a technology created by Microsoft that allows embedding and linking to documents and other objects.

You can use the file command on a Mac OS X system to get information on the file type for files. E.g., in the example below when I made the current working directory the "embeddings" directory, which contained the extracted files from another .xlsm file, I saw the following when I used the command:

$ file *
oleObject1.bin: CDF V2 Document, Little Endian, Os: Windows, Version 6.1, Code p
age: 1252, Title: PowerPoint Presentation, Author: Tracy Wilhelm, Last Saved By:
 Bigelow, Andrew L. (ACCI-760.0)[ABCS], Revision Number: 7, Name of Creating App
lication: Microsoft Office PowerPoint, Total Editing Time: 01:31:38, Create Time
/Date: Mon Jan  7 00:03:32 2013, Last Saved Time/Date: Wed Apr 12 18:42:09 2017,
 Number of Words: 35
oleObject2.bin: CDF V2 Document, Little Endian, Os: Windows, Version 6.1, Code p
age: 1252, Title: PowerPoint Presentation, Author: Tracy Wilhelm, Last Saved By:
 Bigelow, Andrew L. (ACCI-760.0)[ABCS], Revision Number: 4, Name of Creating App
lication: Microsoft Office PowerPoint, Total Editing Time: 01:11:34, Create Time
/Date: Mon Jan  7 00:04:06 2013, Last Saved Time/Date: Wed Apr 12 18:42:09 2017,
 Number of Words: 13
oleObject3.bin: CDF V2 Document, Little Endian, Os: Windows, Version 6.1, Code p
age: 1252, Title: PowerPoint Presentation, Author: Tracy Wilhelm, Last Saved By:
 Clark, Jeff N. (NSFC-IS40)[ABCS], Revision Number: 8, Name of Creating App
lication: Microsoft Office PowerPoint, Total Editing Time: 01:15:21, Create Time
/Date: Mon Jan  7 00:03:32 2013, Last Saved Time/Date: Wed Oct  7 22:48:15 2015,
 Number of Words: 162
oleObject4.bin: CDF V2 Document, No summary info
$ file --mime *
oleObject1.bin: application/vnd.ms-office; charset=binary
oleObject2.bin: application/vnd.ms-office; charset=binary
oleObject3.bin: application/vnd.ms-office; charset=binary
oleObject4.bin: CDF V2 Document, No summary info; charset=binary
$

So I know I can rename the first 3 files to have a .ppt rather than a .bin filename extension and open the files with PowerPoint.

Related articles:

  1. Zipping and unzipping Excel xlsx files
  2. Unprotecting an Excel workbook when the password is not known