There are different algorithms that can be used for storing files within a
zip file that determine the level of compression and speed of production of
a zip file. If you are using a
Linux or
Apple OS X
system, you can see choices that are available for compressing files and
directories into a zip file from the
command line, e.g. from a
Terminal
window on an OS X system, by looking at the
man page
for the zip program by
issuing the command man zip
. The
following options are available for file compression:
-Z cm --compression-method cm Set the default compression method. Currently the main methods supported by zip are store and deflate. Compression method can be set to: store - Setting the compression method to store forces zip to store entries with no compression. This is generally faster than compressing entries, but results in no space savings. This is the same as using -0 (compression level zero). deflate - This is the default method for zip. If zip determines that storing is better than deflation, the entry will be stored instead. bzip2 - If bzip2 support is compiled in, this compression method also becomes available. Only some modern unzips currently sup- port the bzip2 compression method, so test the unzip you will be using before relying on archives using this method (compression method 12). For example, to add bar.c to archive foo using bzip2 compres- sion: zip -Z bzip2 foo bar.c The compression method can be abbreviated: zip -Zb foo bar.c -# (-0, -1, -2, -3, -4, -5, -6, -7, -8, -9) Regulate the speed of compression using the specified digit #, where -0 indicates no compression (store all files), -1 indi- cates the fastest compression speed (less compression) and -9 indicates the slowest compression speed (optimal compression, ignores the suffix list). The default compression level is -6. Though still being worked, the intention is this setting will control compression speed for all compression methods. Cur- rently only deflation is controlled.
Excel uses the "deflate" storage method as can be seen for the following
Example.xlsx
file produced by Microsoft Excel for Mac 2011 on a
MacBook Pro laptop by using the
zipinfo command, which
lists detailed information about zip archives. The Example workbook has two
worksheets with the default names of sheet1
and
sheet2
.
$ zipinfo Example.xlsx Archive: Example.xlsx 32642 bytes 12 files -rw---- 4.5 fat 1356 b- defS 1-Jan-80 00:00 [Content_Types].xml -rw---- 4.5 fat 733 b- defS 1-Jan-80 00:00 _rels/.rels -rw---- 4.5 fat 839 b- defS 1-Jan-80 00:00 xl/_rels/workbook.xml.rels -rw---- 4.5 fat 805 b- defS 1-Jan-80 00:00 xl/workbook.xml -rw---- 4.5 fat 1055 b- defS 1-Jan-80 00:00 xl/styles.xml -rw---- 4.5 fat 1314 b- defS 1-Jan-80 00:00 xl/worksheets/sheet2.xml -rw---- 4.5 fat 7646 b- defS 1-Jan-80 00:00 xl/theme/theme1.xml -rw---- 4.5 fat 1206 b- defS 1-Jan-80 00:00 xl/worksheets/sheet1.xml -rw---- 4.5 fat 23468 b- stor 1-Jan-80 00:00 docProps/thumbnail.jpeg -rw---- 4.5 fat 454 b- defS 1-Jan-80 00:00 xl/sharedStrings.xml -rw---- 4.5 fat 872 b- defS 1-Jan-80 00:00 docProps/app.xml -rw---- 4.5 fat 755 b- defS 1-Jan-80 00:00 docProps/core.xml 12 files, 40503 bytes uncompressed, 29422 bytes compressed: 27.4% $
The "defS" in the sixth column indicates the compression method used. There are six methods known at present: storing (no compression), reducing, shrinking, imploding, tokenizing (never publicly released), and deflating. In addition, there are four levels of reducing (1 through 4); four types of imploding (4K or 8K sliding dictionary, and 2 or 3 Shannon-Fano trees); and four levels of deflating (superfast, fast, normal, maximum compression). zipinfo represents these methods and their sub-methods as follows: stor; re:1, re:2, etc.; shrk; i4:2, i8:3, etc.; tokn; and defS, defF, defN, and defX.
If I rename the Example.xlsx
file to Example.zip
and extract its contents and then rezip the contents of the directory produced
when I extracted the contents of the zip file with the zip utility,
I see the following:
$ zip -r Example.zip Example/* adding: Example/[Content_Types].xml (deflated 74%) adding: Example/_rels/ (stored 0%) adding: Example/_rels/.rels (deflated 65%) adding: Example/docProps/ (stored 0%) adding: Example/docProps/app.xml (deflated 53%) adding: Example/docProps/core.xml (deflated 50%) adding: Example/docProps/thumbnail.jpeg (deflated 23%) adding: Example/xl/ (stored 0%) adding: Example/xl/_rels/ (stored 0%) adding: Example/xl/_rels/workbook.xml.rels (deflated 71%) adding: Example/xl/sharedStrings.xml (deflated 50%) adding: Example/xl/styles.xml (deflated 55%) adding: Example/xl/theme/ (stored 0%) adding: Example/xl/theme/theme1.xml (deflated 80%) adding: Example/xl/workbook.xml (deflated 42%) adding: Example/xl/worksheets/ (stored 0%) adding: Example/xl/worksheets/sheet1.xml (deflated 53%) adding: Example/xl/worksheets/sheet2.xml (deflated 54%) $
I.e., I see that, by default, the deflation method is being used for files. If I then check the .zip file produced by the zip utility with the zipinfo utility, I see "defN" listed in the sixth column for the compression method.
$ zipinfo Example.zip Archive: Example.zip 26878 bytes 18 files -rwxr-xr-x 3.0 unx 1356 tx defN 1-Jan-80 00:00 Example/[Content_Types].xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/_rels/ -rwxr-xr-x 3.0 unx 733 tx defN 1-Jan-80 00:00 Example/_rels/.rels drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/docProps/ -rwxr-xr-x 3.0 unx 872 tx defN 1-Jan-80 00:00 Example/docProps/app.xml -rwxr-xr-x 3.0 unx 755 tx defN 1-Jan-80 00:00 Example/docProps/core.xml -rwxr-xr-x 3.0 unx 23468 bx defN 1-Jan-80 00:00 Example/docProps/thumbnail.jpeg drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/ drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/_rels/ -rwxr-xr-x 3.0 unx 839 tx defN 1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels -rwxr-xr-x 3.0 unx 454 tx defN 1-Jan-80 00:00 Example/xl/sharedStrings.xml -rwxr-xr-x 3.0 unx 1055 tx defN 1-Jan-80 00:00 Example/xl/styles.xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/theme/ -rwxr-xr-x 3.0 unx 7646 tx defN 1-Jan-80 00:00 Example/xl/theme/theme1.xml -rwxr-xr-x 3.0 unx 805 tx defN 1-Jan-80 00:00 Example/xl/workbook.xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/worksheets/ -rwxr-xr-x 3.0 unx 1206 tx defN 1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml -rwxr-xr-x 3.0 unx 1314 tx defN 1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml
Deflation is the default compression method, but using the
zip utility on an OS X
system I can specify the level of reducing to be 1
using
-Z deflate -1
. I then see the compression method listed as
"defF".
$ zip -r -Z deflate -1 Example.zip Example/* adding: Example/[Content_Types].xml (deflated 73%) adding: Example/_rels/ (stored 0%) adding: Example/_rels/.rels (deflated 64%) adding: Example/docProps/ (stored 0%) adding: Example/docProps/app.xml (deflated 52%) adding: Example/docProps/core.xml (deflated 48%) adding: Example/docProps/thumbnail.jpeg (deflated 22%) adding: Example/xl/ (stored 0%) adding: Example/xl/_rels/ (stored 0%) adding: Example/xl/_rels/workbook.xml.rels (deflated 71%) adding: Example/xl/sharedStrings.xml (deflated 49%) adding: Example/xl/styles.xml (deflated 54%) adding: Example/xl/theme/ (stored 0%) adding: Example/xl/theme/theme1.xml (deflated 76%) adding: Example/xl/workbook.xml (deflated 41%) adding: Example/xl/worksheets/ (stored 0%) adding: Example/xl/worksheets/sheet1.xml (deflated 52%) adding: Example/xl/worksheets/sheet2.xml (deflated 51%) $ zipinfo Example.zip Archive: Example.zip 27389 bytes 18 files -rwxr-xr-x 3.0 unx 1356 tx defF 1-Jan-80 00:00 Example/[Content_Types].xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/_rels/ -rwxr-xr-x 3.0 unx 733 tx defF 1-Jan-80 00:00 Example/_rels/.rels drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/docProps/ -rwxr-xr-x 3.0 unx 872 tx defF 1-Jan-80 00:00 Example/docProps/app.xml -rwxr-xr-x 3.0 unx 755 tx defF 1-Jan-80 00:00 Example/docProps/core.xml -rwxr-xr-x 3.0 unx 23468 bx defF 1-Jan-80 00:00 Example/docProps/thumbnail.jpeg drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/ drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/_rels/ -rwxr-xr-x 3.0 unx 839 tx defF 1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels -rwxr-xr-x 3.0 unx 454 tx defF 1-Jan-80 00:00 Example/xl/sharedStrings.xml -rwxr-xr-x 3.0 unx 1055 tx defF 1-Jan-80 00:00 Example/xl/styles.xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/theme/ -rwxr-xr-x 3.0 unx 7646 tx defF 1-Jan-80 00:00 Example/xl/theme/theme1.xml -rwxr-xr-x 3.0 unx 805 tx defF 1-Jan-80 00:00 Example/xl/workbook.xml drwxr-xr-x 3.0 unx 0 bx stor 13-Nov-16 15:37 Example/xl/worksheets/ -rwxr-xr-x 3.0 unx 1206 tx defF 1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml -rwxr-xr-x 3.0 unx 1314 tx defF 1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml 18 files, 40503 bytes uncompressed, 24221 bytes compressed: 40.2% $
Using -2
for the level of delation also results in the
deflation method shown by zipinfo being "defF". If -3
is
used, "defN" is shown; "defN" is also shown by zipinfo if
-4
is used as an
argument to the zip command. If I issue the command
zip -r -n Example.zip Example/*
, where n is a
number between 0 and 9, on an OS X system, I see the following methods listed
when I check the resulting zip file with zipinfo.
Number | Method | |
---|---|---|
0 | stor | |
1 | defF | |
2 | defF | |
3 | defN | |
4 | defN | |
5 | defN | |
6 | defN | |
7 | defN | |
8 | defX | |
9 | defX |
You might expect that you could recompress the folder where the files
were extracted to produce a new zip file and then rename that to be a .xlsx
file which you could then open with Excel. That won't work, however. If you
right click on the directory containing the extracted files from the
Finder application and choose Compress dirname where
dirname is the relevant directory name, that will produce a new zip
file. E.g., if the file you started from was Example.xls
and you
renamed it to Example.zip
and then extracted the contents of the
zip file to the directory Example
, you would now have a
Example.zip
file again. You can right-click on it and rename the
extension back to .xlsx, but if you try to open that file with Excel you will
see the message "file format is not valid". You will also have the same problem
if you use the command line zip
program, e.g., if you used zip -r Example.zip Example/*
.
The zip file produced by those methods on an Apple OS X system does
not match what Excel is expecting.
If you create a zip file with the
Python
zipfile
module, the deflate method is used by default and you will see
"defN", if you check the zip file created with that module. You can
create a zip file of a directory from the command line with that
module using the command python -m zipfile -c zipfilename
directory
where zipfilename is the name you
wish to give to the zip file and directory is the directory
you wish to compress into a zip file. E.g., python -m zipfile
-c Example.zip Example/
. You will also not be able to open the zip file
produced by that method with Excel by simply renaming the file to have a .xlsx
rather than a .zip extension. You will again get the "file format is not
valid" message, if you try to open the file in Excel.
However, if you use the Python shutil module to create the zip file from the directory and then rename the zip file to a .xlsx file, you can open it in Excel. The zipdir.py Python script can be used to create a zip file that can be opened with Excel after changing the file extension.
I noticed that when I produce zip files from a directory on an OS X system
using methods other than with the Python shutil module, that the directories
are shown with the "stor" compression method. I.e., they are stored with no
compression employed. When I compress the directory with
zipdir.py
E.g., zipinfo shows the following information for a zip file produced
by python -m zipfile -c Example.zip Example/
:
$ zipinfo Example.zip Archive: Example.zip 26101 bytes 19 files drwx------ 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/ -rwxr-xr-x 2.0 unx 1356 b- defN 1-Jan-80 00:00 Example/[Content_Types].xml drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/_rels/ -rwxr-xr-x 2.0 unx 733 b- defN 1-Jan-80 00:00 Example/_rels/.rels drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/docProps/ -rwxr-xr-x 2.0 unx 872 b- defN 1-Jan-80 00:00 Example/docProps/app.xml -rwxr-xr-x 2.0 unx 755 b- defN 1-Jan-80 00:00 Example/docProps/core.xml -rwxr-xr-x 2.0 unx 23468 b- defN 1-Jan-80 00:00 Example/docProps/thumbnail.jpeg drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/xl/ drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/xl/_rels/ -rwxr-xr-x 2.0 unx 839 b- defN 1-Jan-80 00:00 Example/xl/_rels/workbook.xml.rels -rwxr-xr-x 2.0 unx 454 b- defN 1-Jan-80 00:00 Example/xl/sharedStrings.xml -rwxr-xr-x 2.0 unx 1055 b- defN 1-Jan-80 00:00 Example/xl/styles.xml drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/xl/theme/ -rwxr-xr-x 2.0 unx 7646 b- defN 1-Jan-80 00:00 Example/xl/theme/theme1.xml -rwxr-xr-x 2.0 unx 805 b- defN 1-Jan-80 00:00 Example/xl/workbook.xml drwxr-xr-x 2.0 unx 0 b- stor 13-Nov-16 15:37 Example/xl/worksheets/ -rwxr-xr-x 2.0 unx 1206 b- defN 1-Jan-80 00:00 Example/xl/worksheets/sheet1.xml -rwxr-xr-x 2.0 unx 1314 b- defN 1-Jan-80 00:00 Example/xl/worksheets/sheet2.xml 19 files, 40503 bytes uncompressed, 23777 bytes compressed: 41.3% $
But the same directory when compressed by zipdir.py
with the
Python shutil module into a zip file produces a zip file for which zipinfo
shows the following information:
$ zipinfo example.zip Archive: example.zip 25165 bytes 12 files -rwxr-xr-x 2.0 unx 1356 b- defN 1-Jan-80 00:00 [Content_Types].xml -rwxr-xr-x 2.0 unx 733 b- defN 1-Jan-80 00:00 _rels/.rels -rwxr-xr-x 2.0 unx 872 b- defN 1-Jan-80 00:00 docProps/app.xml -rwxr-xr-x 2.0 unx 755 b- defN 1-Jan-80 00:00 docProps/core.xml -rwxr-xr-x 2.0 unx 23468 b- defN 1-Jan-80 00:00 docProps/thumbnail.jpeg -rwxr-xr-x 2.0 unx 454 b- defN 1-Jan-80 00:00 xl/sharedStrings.xml -rwxr-xr-x 2.0 unx 1055 b- defN 1-Jan-80 00:00 xl/styles.xml -rwxr-xr-x 2.0 unx 805 b- defN 1-Jan-80 00:00 xl/workbook.xml -rwxr-xr-x 2.0 unx 839 b- defN 1-Jan-80 00:00 xl/_rels/workbook.xml.rels -rwxr-xr-x 2.0 unx 7646 b- defN 1-Jan-80 00:00 xl/theme/theme1.xml -rwxr-xr-x 2.0 unx 1206 b- defN 1-Jan-80 00:00 xl/worksheets/sheet1.xml -rwxr-xr-x 2.0 unx 1314 b- defN 1-Jan-80 00:00 xl/worksheets/sheet2.xml 12 files, 40503 bytes uncompressed, 23777 bytes compressed: 41.3% $
So, perhaps, the issue is caused by the directory information being "stored" rather than deflation being used.
Note: testing done on a MacBook Pro laptop running OS X Yosemite (10.10.5) using Microsoft Office for Mac 2011.