Zipping and unzipping Excel xlsx files

If you have a file with an .xlsx extension on the file name that was last edited by Microsoft Excel, then the file is stored in an Office Open XML (OpenXML) zipped, XML-based file format developed by Microsoft for spreadsheets, charts, presentations and word processing documents. You can change the file extension to .zip by renaming the file. You should then be able to extract the contents of the zip file as you would with any other zip file.

There are different algorithms that can be used for storing files within a zip file that determine the level of compression and speed of production of a zip file. If you are using a Linux or Apple OS X system, you can see choices that are available for compressing files and directories into a zip file from the command line, e.g. from a Terminal window on an OS X system, by looking at the man page for the zip program by issuing the command man zip. The following options are available for file compression:

Generic Category (English)120x600
       -Z cm
       --compression-method cm
              Set  the default compression method.  Currently the main methods
              supported by zip are store and deflate.  Compression method  can
              be set to:

              store  -  Setting  the compression method to store forces zip to
              store entries with no compression.   This  is  generally  faster
              than compressing entries, but results in no space savings.  This
              is the same as using -0 (compression level zero).

              deflate - This is the default method for zip.  If zip determines
              that  storing is better than deflation, the entry will be stored
              instead.

              bzip2 - If bzip2 support is compiled in, this compression method
              also  becomes available.  Only some modern unzips currently sup-
              port the bzip2 compression method, so test the unzip you will be
              using  before relying on archives using this method (compression
              method 12).

              For example, to add bar.c to archive foo  using  bzip2  compres-
              sion:

                     zip -Z bzip2 foo bar.c

              The compression method can be abbreviated:

                     zip -Zb foo bar.c

       -#
       (-0, -1, -2, -3, -4, -5, -6, -7, -8, -9)
              Regulate  the  speed of compression using the specified digit #,
              where -0 indicates no compression (store all  files),  -1  indi-
              cates  the  fastest  compression speed (less compression) and -9
              indicates the slowest compression  speed  (optimal  compression,
              ignores the suffix list). The default compression level is -6.

              Though  still  being  worked, the intention is this setting will
              control compression speed for  all  compression  methods.   Cur-
              rently only deflation is controlled.

Excel uses the "deflate" storage method as can be seen for the following Example.xlsx file produced by Microsoft Excel for Mac 2011 on a MacBook Pro laptop by using the zipinfo command, which lists detailed information about zip archives. The Example workbook has two worksheets with the default names of sheet1 and sheet2.

$ zipinfo Example.xlsx
Archive:  Example.xlsx   32642 bytes   12 files
-rw----     4.5 fat     1356 b- defS  1-Jan-80 00:00 [Content_Types].xml
-rw----     4.5 fat      733 b- defS  1-Jan-80 00:00 _rels/.rels
-rw----     4.5 fat      839 b- defS  1-Jan-80 00:00 xl/_rels/workbook.xml.rels
-rw----     4.5 fat      805 b- defS  1-Jan-80 00:00 xl/workbook.xml
-rw----     4.5 fat     1055 b- defS  1-Jan-80 00:00 xl/styles.xml
-rw----     4.5 fat     1314 b- defS  1-Jan-80 00:00 xl/worksheets/sheet2.xml
-rw----     4.5 fat     7646 b- defS  1-Jan-80 00:00 xl/theme/theme1.xml
-rw----     4.5 fat     1206 b- defS  1-Jan-80 00:00 xl/worksheets/sheet1.xml
-rw----     4.5 fat    23468 b- stor  1-Jan-80 00:00 docProps/thumbnail.jpeg
-rw----     4.5 fat      454 b- defS  1-Jan-80 00:00 xl/sharedStrings.xml
-rw----     4.5 fat      872 b- defS  1-Jan-80 00:00 docProps/app.xml
-rw----     4.5 fat      755 b- defS  1-Jan-80 00:00 docProps/core.xml
12 files, 40503 bytes uncompressed, 29422 bytes compressed:  27.4%
$

The "defS" in the sixth column indicates the compression method used. There are six methods known at present: storing (no compression), reducing, shrinking, imploding, tokenizing (never publicly released), and deflating. In addition, there are four levels of reducing (1 through 4); four types of imploding (4K or 8K sliding dictionary, and 2 or 3 Shannon-Fano trees); and four levels of deflating (superfast, fast, normal, maximum compression). zipinfo represents these methods and their sub-methods as follows: stor; re:1, re:2, etc.; shrk; i4:2, i8:3, etc.; tokn; and defS, defF, defN, and defX.

If I rename the Example.xlsx file to Example.zip and extract its contents and then rezip the contents of the directory produced when I extracted the contents of the zip file with the zip utility, I see the following:

$ zip -r Example.zip Example/*
  adding: Example/[Content_Types].xml (deflated 74%)
  adding: Example/_rels/ (stored 0%)
  adding: Example/_rels/.rels (deflated 65%)
  adding: Example/docProps/ (stored 0%)
  adding: Example/docProps/app.xml (deflated 53%)
  adding: Example/docProps/core.xml (deflated 50%)
  adding: Example/docProps/thumbnail.jpeg (deflated 23%)
  adding: Example/xl/ (stored 0%)
  adding: Example/xl/_rels/ (stored 0%)
  adding: Example/xl/_rels/workbook.xml.rels (deflated 71%)
  adding: Example/xl/sharedStrings.xml (deflated 50%)
  adding: Example/xl/styles.xml (deflated 55%)
  adding: Example/xl/theme/ (stored 0%)
  adding: Example/xl/theme/theme1.xml (deflated 80%)
  adding: Example/xl/workbook.xml (deflated 42%)
  adding: Example/xl/worksheets/ (stored 0%)
  adding: Example/xl/worksheets/sheet1.xml (deflated 53%)
  adding: Example/xl/worksheets/sheet2.xml (deflated 54%)
$

I.e., I see that, by default, the deflation method is being used for files. If I then check the .zip file produced by the zip utility with the zipinfo utility, I see "defN" listed in the sixth column for the compression method.

$ zipinfo Example.zip
Archive:  Example.zip   26878 bytes   18 files
-rwxr-xr-x  3.0 unx     1356 tx defN  1-Jan-80 00:00 Example/[Content_Types].xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/_rels/
-rwxr-xr-x  3.0 unx      733 tx defN  1-Jan-80 00:00 Example/_rels/.rels
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/docProps/
-rwxr-xr-x  3.0 unx      872 tx defN  1-Jan-80 00:00 Example/docProps/app.xml
-rwxr-xr-x  3.0 unx      755 tx defN  1-Jan-80 00:00 Example/docProps/core.xml
-rwxr-xr-x  3.0 unx    23468 bx defN  1-Jan-80 00:00 Example/docProps/thumbnail.jpeg
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/_rels/
-rwxr-xr-x  3.0 unx      839 tx defN  1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels
-rwxr-xr-x  3.0 unx      454 tx defN  1-Jan-80 00:00 Example/xl/sharedStrings.xml
-rwxr-xr-x  3.0 unx     1055 tx defN  1-Jan-80 00:00 Example/xl/styles.xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/theme/
-rwxr-xr-x  3.0 unx     7646 tx defN  1-Jan-80 00:00 Example/xl/theme/theme1.xml
-rwxr-xr-x  3.0 unx      805 tx defN  1-Jan-80 00:00 Example/xl/workbook.xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/worksheets/
-rwxr-xr-x  3.0 unx     1206 tx defN  1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml
-rwxr-xr-x  3.0 unx     1314 tx defN  1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml

Deflation is the default compression method, but using the zip utility on an OS X system I can specify the level of reducing to be 1 using -Z deflate -1. I then see the compression method listed as "defF".

$ zip -r -Z deflate -1 Example.zip Example/*
  adding: Example/[Content_Types].xml (deflated 73%)
  adding: Example/_rels/ (stored 0%)
  adding: Example/_rels/.rels (deflated 64%)
  adding: Example/docProps/ (stored 0%)
  adding: Example/docProps/app.xml (deflated 52%)
  adding: Example/docProps/core.xml (deflated 48%)
  adding: Example/docProps/thumbnail.jpeg (deflated 22%)
  adding: Example/xl/ (stored 0%)
  adding: Example/xl/_rels/ (stored 0%)
  adding: Example/xl/_rels/workbook.xml.rels (deflated 71%)
  adding: Example/xl/sharedStrings.xml (deflated 49%)
  adding: Example/xl/styles.xml (deflated 54%)
  adding: Example/xl/theme/ (stored 0%)
  adding: Example/xl/theme/theme1.xml (deflated 76%)
  adding: Example/xl/workbook.xml (deflated 41%)
  adding: Example/xl/worksheets/ (stored 0%)
  adding: Example/xl/worksheets/sheet1.xml (deflated 52%)
  adding: Example/xl/worksheets/sheet2.xml (deflated 51%)
$ zipinfo Example.zip
Archive:  Example.zip   27389 bytes   18 files
-rwxr-xr-x  3.0 unx     1356 tx defF  1-Jan-80 00:00 Example/[Content_Types].xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/_rels/
-rwxr-xr-x  3.0 unx      733 tx defF  1-Jan-80 00:00 Example/_rels/.rels
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/docProps/
-rwxr-xr-x  3.0 unx      872 tx defF  1-Jan-80 00:00 Example/docProps/app.xml
-rwxr-xr-x  3.0 unx      755 tx defF  1-Jan-80 00:00 Example/docProps/core.xml
-rwxr-xr-x  3.0 unx    23468 bx defF  1-Jan-80 00:00 Example/docProps/thumbnail.jpeg
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/_rels/
-rwxr-xr-x  3.0 unx      839 tx defF  1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels
-rwxr-xr-x  3.0 unx      454 tx defF  1-Jan-80 00:00 Example/xl/sharedStrings.xml
-rwxr-xr-x  3.0 unx     1055 tx defF  1-Jan-80 00:00 Example/xl/styles.xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/theme/
-rwxr-xr-x  3.0 unx     7646 tx defF  1-Jan-80 00:00 Example/xl/theme/theme1.xml
-rwxr-xr-x  3.0 unx      805 tx defF  1-Jan-80 00:00 Example/xl/workbook.xml
drwxr-xr-x  3.0 unx        0 bx stor 13-Nov-16 15:37 Example/xl/worksheets/
-rwxr-xr-x  3.0 unx     1206 tx defF  1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml
-rwxr-xr-x  3.0 unx     1314 tx defF  1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml
18 files, 40503 bytes uncompressed, 24221 bytes compressed:  40.2%
$

Using -2 for the level of delation also results in the deflation method shown by zipinfo being "defF". If -3 is used, "defN" is shown; "defN" is also shown by zipinfo if -4 is used as an argument to the zip command. If I issue the command zip -r -n Example.zip Example/*, where n is a number between 0 and 9, on an OS X system, I see the following methods listed when I check the resulting zip file with zipinfo.

NumberMethod
0stor
1defF
2defF
3defN
4defN
5defN
6defN
7defN
8defX
9defX

You might expect that you could recompress the folder where the files were extracted to produce a new zip file and then rename that to be a .xlsx file which you could then open with Excel. That won't work, however. If you right click on the directory containing the extracted files from the Finder application and choose Compress dirname where dirname is the relevant directory name, that will produce a new zip file. E.g., if the file you started from was Example.xls and you renamed it to Example.zip and then extracted the contents of the zip file to the directory Example, you would now have a Example.zip file again. You can right-click on it and rename the extension back to .xlsx, but if you try to open that file with Excel you will see the message "file format is not valid". You will also have the same problem if you use the command line zip program, e.g., if you used zip -r Example.zip Example/*. The zip file produced by those methods on an Apple OS X system does not match what Excel is expecting.

If you create a zip file with the Python zipfile module, the deflate method is used by default and you will see "defN", if you check the zip file created with that module. You can create a zip file of a directory from the command line with that module using the command python -m zipfile -c zipfilename directory where zipfilename is the name you wish to give to the zip file and directory is the directory you wish to compress into a zip file. E.g., python -m zipfile -c Example.zip Example/. You will also not be able to open the zip file produced by that method with Excel by simply renaming the file to have a .xlsx rather than a .zip extension. You will again get the "file format is not valid" message, if you try to open the file in Excel.

However, if you use the Python shutil module to create the zip file from the directory and then rename the zip file to a .xlsx file, you can open it in Excel. The zipdir.py Python script can be used to create a zip file that can be opened with Excel after changing the file extension.

I noticed that when I produce zip files from a directory on an OS X system using methods other than with the Python shutil module, that the directories are shown with the "stor" compression method. I.e., they are stored with no compression employed. When I compress the directory with zipdir.py E.g., zipinfo shows the following information for a zip file produced by python -m zipfile -c Example.zip Example/:

DJI Phantom 3 Drone
$ zipinfo Example.zip
Archive:  Example.zip   26101 bytes   19 files
drwx------  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/
-rwxr-xr-x  2.0 unx     1356 b- defN  1-Jan-80 00:00 Example/[Content_Types].xml
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/_rels/
-rwxr-xr-x  2.0 unx      733 b- defN  1-Jan-80 00:00 Example/_rels/.rels
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/docProps/
-rwxr-xr-x  2.0 unx      872 b- defN  1-Jan-80 00:00 Example/docProps/app.xml
-rwxr-xr-x  2.0 unx      755 b- defN  1-Jan-80 00:00 Example/docProps/core.xml
-rwxr-xr-x  2.0 unx    23468 b- defN  1-Jan-80 00:00 Example/docProps/thumbnail.jpeg
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/xl/
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/xl/_rels/
-rwxr-xr-x  2.0 unx      839 b- defN  1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels
-rwxr-xr-x  2.0 unx      454 b- defN  1-Jan-80 00:00 Example/xl/sharedStrings.xml
-rwxr-xr-x  2.0 unx     1055 b- defN  1-Jan-80 00:00 Example/xl/styles.xml
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/xl/theme/
-rwxr-xr-x  2.0 unx     7646 b- defN  1-Jan-80 00:00 Example/xl/theme/theme1.xml
-rwxr-xr-x  2.0 unx      805 b- defN  1-Jan-80 00:00 Example/xl/workbook.xml
drwxr-xr-x  2.0 unx        0 b- stor 13-Nov-16 15:37 Example/xl/worksheets/
-rwxr-xr-x  2.0 unx     1206 b- defN  1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml
-rwxr-xr-x  2.0 unx     1314 b- defN  1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml
19 files, 40503 bytes uncompressed, 23777 bytes compressed:  41.3%
$

But the same directory when compressed by zipdir.py with the Python shutil module into a zip file produces a zip file for which zipinfo shows the following information:

$ zipinfo example.zip
Archive:  example.zip   25165 bytes   12 files
-rwxr-xr-x  2.0 unx     1356 b- defN  1-Jan-80 00:00 [Content_Types].xml
-rwxr-xr-x  2.0 unx      733 b- defN  1-Jan-80 00:00 _rels/.rels
-rwxr-xr-x  2.0 unx      872 b- defN  1-Jan-80 00:00 docProps/app.xml
-rwxr-xr-x  2.0 unx      755 b- defN  1-Jan-80 00:00 docProps/core.xml
-rwxr-xr-x  2.0 unx    23468 b- defN  1-Jan-80 00:00 docProps/thumbnail.jpeg
-rwxr-xr-x  2.0 unx      454 b- defN  1-Jan-80 00:00 xl/sharedStrings.xml
-rwxr-xr-x  2.0 unx     1055 b- defN  1-Jan-80 00:00 xl/styles.xml
-rwxr-xr-x  2.0 unx      805 b- defN  1-Jan-80 00:00 xl/workbook.xml
-rwxr-xr-x  2.0 unx      839 b- defN  1-Jan-80 00:00 xl/_rels/workbook.xml.rels
-rwxr-xr-x  2.0 unx     7646 b- defN  1-Jan-80 00:00 xl/theme/theme1.xml
-rwxr-xr-x  2.0 unx     1206 b- defN  1-Jan-80 00:00 xl/worksheets/sheet1.xml
-rwxr-xr-x  2.0 unx     1314 b- defN  1-Jan-80 00:00 xl/worksheets/sheet2.xml
12 files, 40503 bytes uncompressed, 23777 bytes compressed:  41.3%
$

So, perhaps, the issue is caused by the directory information being "stored" rather than deflation being used.

Note: testing done on a MacBook Pro laptop running OS X Yosemite (10.10.5) using Microsoft Office for Mac 2011.

 

TechRabbit ad 300x250 newegg.com

Justdeals Daily Electronics Deals1x1 px