Extract images from a PDF file with Python

Learning that lasts. Online courses from $14.99

You can use the PyMuPDF module with Python to extract images from a PDF file. You can install PyMuPDF using the pip package manager with the command pip install PyMuPDF . You can determine if it is already installed with the command pip list | grep PyMuPDF or pip freeze | grep PyMuPDF.

# pip list | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF                          1.14.13
# pip freeze | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF==1.14.13
#

The code for the file is in extract-PDF-image.py. The PDF file from which images are to be exracted should be provided on the command line, e.g., ./extract-PDF-image.py somefile.pdf. If any images are found within the file, they will be extracted as PNG files with names in the form img0-11_150x109.png where the last part of the name indicates the dimensions of the image in pixels, e.g., 150 pixels wide x 109 pixels high. As an example of a PDF file with multiple images within it, you can use bpb13187.pdf.

Related articles:

  1. Installing pip to install Python packages on a CentOS system
  2. Installing new packages for WinPython
  3. Determine Python installed modules/packages