Summing the file sizes in a directory

If you wish to calculate a total size for all files in a directory on a Unix, Linux, or Mac system running Apple's OS X operating system, two ways to do so are by using the awk utility or with the Python programming language. The examples below will work on Unix, Linux, or OS X. The examples below presume that you don't need to calculate the total file size for a directory recursively.

Suppose I have the following files in the directory "example" on a system and want to add the size of each file to obtain a total size for the files in the directory:

$ ls -l example
total 1928
-rwxr-xr-x  1 jasmith1  1286109195  135319 Mar 28 16:31 file1.png
-rw-r--r--@ 1 jasmith1  1286109195  192725 Mar 28 16:31 file2.png
-rw-r--r--@ 1 jasmith1  1286109195  218882 Mar 28 16:31 file3.png
-rw-r--r--@ 1 jasmith1  1286109195   20793 Mar 28 16:31 file4.png
-rwxr-xr-x  1 jasmith1  1286109195   67526 Mar 28 16:31 file5.png
-rwxr-xr-x  1 jasmith1  1286109195  135085 Mar 28 16:31 file6.png
-rw-r--r--@ 1 jasmith1  1286109195  196391 Mar 28 16:31 file7.png
-rw-r--r--  1 jasmith1  1286109195    4065 Mar 28 16:31 index.html

One way to obtain the sum of all the file sizes is to use the AWK utility. Since the file sizes are listed in column 5, I can use the following command, which shows me that the file sizes add up to a total of 970,786 bytes:

$ ls -l example |  awk '{sum +=$5} END {print sum}'
970786

If you wish to include the size of any hidden files in the calculation, you will need to use ls -al. For the above example, on an OS X system, there will be a .DS_Store file in the directory (the period at the beginning of the file name makes it a "hidden" file, i.e., one that doesn't show up if you issue the command ls without the -a option.

$ ls -al example
total 1944
drwxr-xr-x  11 jasmith1  1286109195     374 Mar 28 18:16 .
drwxr-xr-x  72 jasmith1  1286109195    2448 Mar 28 20:48 ..
-rw-r--r--@  1 jasmith1  1286109195    6148 Mar 28 16:49 .DS_Store
-rwxr-xr-x   1 jasmith1  1286109195  135319 Mar 28 16:31 file1.png
-rw-r--r--@  1 jasmith1  1286109195  192725 Mar 28 16:31 file2.png
-rw-r--r--@  1 jasmith1  1286109195  218882 Mar 28 16:31 file3.png
-rw-r--r--@  1 jasmith1  1286109195   20793 Mar 28 16:31 file4.png
-rwxr-xr-x   1 jasmith1  1286109195   67526 Mar 28 16:31 file5.png
-rwxr-xr-x   1 jasmith1  1286109195  135085 Mar 28 16:31 file6.png
-rw-r--r--@  1 jasmith1  1286109195  196391 Mar 28 16:31 file7.png
-rw-r--r--   1 jasmith1  1286109195    4065 Mar 28 16:31 index.html
$

The ls -al option also lists the "dot", i.e. ".", which represents a link to the current directory and "dot dot" entry, i.e., "..", which represents a link to the parent directory.

So, if you use ls -al, you will get a slightly larger number than you would when using ls -l.

$ ls -al example |  awk '{sum +=$5} END {print sum}'
979756

If you want to include the size of hidden files, but exclude the size for the "dot" and "dot dot" entries, you can pipe the output of ls through grep, and use the -v option for grep to exclude any line that ends with a period, prior to piping the directory entries into awk as shown below. The backslash before the period takes away the meaning of that character for the shell, so it will be processed as the character "." by grep, i.e., the backslash is an escape character. The dollar sign, "$", represents the end of the line for grep, so .$ instructs grep, when the -v option is used, to exclude any lines ending with a period.

$ ls -al example | grep -v "\.$" 
total 1944
-rw-r--r--@  1 jasmith1  1286109195    6148 Mar 28 16:49 .DS_Store
-rwxr-xr-x   1 jasmith1  1286109195  135319 Mar 28 16:31 file1.png
-rw-r--r--@  1 jasmith1  1286109195  192725 Mar 28 16:31 file2.png
-rw-r--r--@  1 jasmith1  1286109195  218882 Mar 28 16:31 file3.png
-rw-r--r--@  1 jasmith1  1286109195   20793 Mar 28 16:31 file4.png
-rwxr-xr-x   1 jasmith1  1286109195   67526 Mar 28 16:31 file5.png
-rwxr-xr-x   1 jasmith1  1286109195  135085 Mar 28 16:31 file6.png
-rw-r--r--@  1 jasmith1  1286109195  196391 Mar 28 16:31 file7.png
-rw-r--r--   1 jasmith1  1286109195    4065 Mar 28 16:31 index.html
$ ls -al example | grep -v "\.$" | awk '{sum +=$5} END {print sum}'
976934

Another alternative for calculating the total size for all files in a directory is to use the Python programming language. You can run Python from a command line interface (CLI), i.e., a shell prompt and enter just two commands: import os and print sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f)) at the Python >>> prompt as shown below:

$ python
Python 2.7.10 (default, Jul 14 2015, 19:46:27) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))
976934
>>> exit()
$

The current working directory will need to be the directory for which you wish to perform the calculation when you issue the commands. You will see that the total file size does include the size of the hidden .DS_Store file, but not the size of the . (dot) and .. (dot dot) entries.

In the commands above, os.path.getsize(path), which returns the size, in bytes, of path is used to determine the size for a file in the directory.

Alternatively the following code can be placed in a Python program that can be used to determine the total size of files in a directory.

import os, sys

# Provide the directory for which you wish to perform the calculation as
# an argument on the command line.

dirSize = 0
theDir = os.listdir( sys.argv[1] )

for file in theDir:
   full_path_to_file = sys.argv[1] +  "/" + file
   dirSize = dirSize + os.path.getsize(full_path_to_file)
print dirSize

The program, if named dirsize.py can be executed by the command python dirsize.py dirname where dirname is the location and name of the directory for which you wish to perform the calculation. E.g.:

$ python dirsize.py example
976934
$ python dirsize.py ~/Documents/example
976934

As with the first Python example, the size includes any hidden files, but not the . (dot) and .. (dot dot) entries, which aren't files in the directory, but which you will see with ls -al.

If you wish to make the dirsize.py program executable, you can change the permissions on the file with chmod 755 dirsize.py and put the following line as the first line in the file to tell the system where to find the Python interpreter:

#/usr/bin/python

I.e., if #!/usr/python has been inserted as the first line of the file:

$chmod 755 dirsize.py
$./dirsize.py example
976934
$

Reference:

  1. Using awk to sum/count a column of numbers.
    By: duxklr
    Date: March 16, 2009
    commandlinefu.com
  2. Calculating a directory size using Python?
    Posted: September 8, 2009
    Stack Overflow
  3. Handling command-line arguments
    By: Mark Pilgrim
    Dive into Python - Python from novice to pro
  4. Python os.listdir() Method
    www.tutorialspoint.com
  5. Python OS.Path Methods
    www.tutorialspoint.com

 

TechRabbit ad 300x250 newegg.com

Justdeals Daily Electronics Deals1x1 px