Converting PDF to HTML using pdftohtml

Wondershare PDF Editor
PDF Editor, PDF Converter, and PDF to Word Converter - Your Complete PDF Solutions from Wondershare1px x 1px
Corel PDF Fusion

If you need to convert PDF documents to HTML, a free program, pdftohtml, is available that provides that capability.

To install pdftohtml on a Linux system, download pdftohtml and then use the following commands:

gunzip pdftohtml-0.36.tar.gz
tar -xvf pdftohtml-0.36.tar
cd pdftohtml-0.36
make

When you run make, a pdftohtml program will be created in the directory you are in. You can get help on the program by typing ./pdftohtml -h, if you are in that directory. You will then see the following:

pdftohtml version 0.36 http://pdftohtml.sourceforge.net/, based on Xpdf version 2.02
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2003 Glyph & Cog, LLC

Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -q                : don't print any messages or errors
  -h                : print usage information
  -help             : print usage information
  -p                : exchange .pdf links by .html
  -c                : generate complex document
  -i                : ignore images
  -noframes         : generate no frames
  -stdout           : use standard output
  -zoom <fp>        : zoom the pdf document (default 1.5)
  -xml              : output for XML post-processing
  -hidden           : output hidden text
  -nomerge          : do not merge paragraphs
  -enc <string>     : output text encoding name
  -dev <string>     : output device name for Ghostscript (png16m, jpeg etc)
  -v                : print copyright and version info
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)

You may wish to put the program in a directory where it will be in your path and/or accessible to others, such as /usr/local/bin (you may need to be logged into the root account to place it in that location). You can see if it is in your path by issuing the command which pdftohtml.

# cp ~/pdftohtml-0.36/pdftohtml /usr/local/bin/.
# which pdftohtml
/usr/local/bin/pdftohtml

To convert a file from pdf to html, you can type pdftohtml filename.pdf. E.g. to convert a three-page pdf file named ms-antispyware_p090505pdf, I can use pdftohtml ms-antispyware_p090505.pdf. Since I haven't specified an html filename after the pdf filename, pdftohtml automatically creates an html file named ms-antispyware_p090505.html. That file contains HTML code that loads two other HTML files into frames. The HTML code that was generated is shown below.

<FRAMESET cols="100,*">
<FRAME name="links" src="ms-antispyware_p090505_ind.html">
<FRAME name="contents" src="ms-antispyware_p090505s.html">
</FRAMESET>

The ms-antispyware_p090505_ind.html file contains an index to the 3 pages as in the PDF file. The ms-antispyware_p090505s.html file contains the contents of the PDF file in HTML format.

A binary version of the program is available that will run on Windows systems, also.

Developer: Authors
Developer Website: http://pdftohtml.sourceforge.net/
Requirements: Linux or Unix or Windows for the Windows version
Purchase Information: Free
Recommended: Yes    Thump up
Download Sites:

Site Linux/Unix Windows
SourceForge Download Download
MoonPoint Download  

Aiseesoft PDF Converter Ultimate Free Shipping at BiggerBooks.com Aiseesoft Mac PDF Converter Ultimate
Corel PDF Fusion. Buy now for only $49.991x1px

Valid HTML 4.01 Transitional

Created: June 9, 2007