If you need to convert PDF documents to HTML, a free program, pdftohtml, is available that provides that capability.
To install pdftohtml on a Linux system, download pdftohtml and then use the following commands:
gunzip pdftohtml-0.36.tar.gz
tar -xvf pdftohtml-0.36.tar
cd pdftohtml-0.36
make
When you run make, a pdftohtml
program will be created in
the directory you are in. You can get help on the program by typing
./pdftohtml -h
, if you are in that directory. You will
then see the following:
pdftohtml version 0.36 http://pdftohtml.sourceforge.net/, based on Xpdf version 2.02
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2003 Glyph & Cog, LLC
Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom <fp> : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc <string> : output text encoding name
-dev <string> : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
You may wish to put the program in a directory where it will be in your path
and/or accessible to others, such as /usr/local/bin (you may need to be logged
into the root account to place it in that location). You can see if it is in
your path by issuing the command which pdftohtml
.
# cp ~/pdftohtml-0.36/pdftohtml /usr/local/bin/.
# which pdftohtml
/usr/local/bin/pdftohtml
To convert a file from pdf to html, you can type pdftohtml
filename.pdf
. E.g. to convert a three-page pdf file named
ms-antispyware_p090505pdf, I can use pdftohtml
ms-antispyware_p090505.pdf
. Since I haven't specified an html filename
after the pdf filename, pdftohtml automatically creates an html file named
ms-antispyware_p090505.html. That file contains HTML code that loads two
other HTML files into frames. The HTML code that was generated is shown below.
<FRAMESET cols="100,*">
<FRAME name="links" src="ms-antispyware_p090505_ind.html">
<FRAME name="contents" src="ms-antispyware_p090505s.html">
</FRAMESET>
The ms-antispyware_p090505_ind.html file contains an index to the 3 pages as in the PDF file. The ms-antispyware_p090505s.html file contains the contents of the PDF file in HTML format.
A binary version of the program is available that will run on Windows systems, also.
Developer: Authors
Developer Website:
http://pdftohtml.sourceforge.net/
Requirements:
Linux or Unix or Windows for the Windows version
Purchase Information: Free
Recommended: Yes
Download Sites:
Site | Linux/Unix | Windows |
SourceForge | ||
MoonPoint |
Corel PDF Fusion. Buy now for only $49.99
Created: June 9, 2007