MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
October
Sun Mon Tue Wed Thu Fri Sat
   
9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
2024
Months
OctNov Dec


Fri, Apr 19, 2019 10:01 pm

Extract images from a PDF file with Python

You can use the PyMuPDF module with Python to extract images from a PDF file. You can install PyMuPDF using the pip package manager with the command pip install PyMuPDF . You can determine if it is already installed with the command pip list | grep PyMuPDF or pip freeze | grep PyMuPDF.

# pip list | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF                          1.14.13
# pip freeze | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF==1.14.13
#

The code for the file is in extract-PDF-image.py.

[ More Info ]

[/languages/python] permanent link

Sat, Aug 18, 2018 10:16 pm

Determine Python installed modules/packages

If you need to determine the packages/modules/ libraries installed for Python on a system, you can do so by obtaining a Python command prompt and issuing the help("modules") command.

$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> help("modules")

Please wait a moment while I gather a list of all available modules...

2018-08-18 15:31:45.666 Python[74959:5231860] Cannot find executable for CFBundle 0x
7fe335602be0 </System/Library/Frameworks/Message.framework> (not loaded)
AVFoundation        _TE                 dircache            profile
Accounts            _Win                dis                 pstats
AddressBook         __builtin__         distutils           pty
AppKit              __future__          dl                  pwd
AppleScriptKit      _abcoll             doctest             py2app
AppleScriptObjC     _ast                dumbdbm             py_compile
Audio_mac           _bisect             dummy_thread        pyclbr
Automator           _builtinSuites      dummy_threading     pydoc
BaseHTTPServer      _codecs             easy_install        pydoc_data
Bastion             _codecs_cn          email               pyexpat
CFNetwork           _codecs_hk          encodings           pylab
CFOpenDirectory     _codecs_iso2022     ensurepip           pyparsing
CGIHTTPServer       _codecs_jp          errno               pytz
Canvas              _codecs_kr          exceptions          quopri
Carbon              _codecs_tw          fcntl               random
Cocoa               _collections        filecmp             re
CodeWarrior         _csv                fileinput           readline
Collaboration       _ctypes             findertools         repr
ColorPicker         _ctypes_test        fnmatch             resource
ConfigParser        _curses             formatter           rexec
Cookie              _curses_panel       fpformat            rfc822
CoreData            _elementtree        fractions           rlcompleter
CoreFoundation      _functools          ftplib              robotparser
CoreGraphics        _hashlib            functools           runpy
CoreLocation        _heapq              future_builtins     sched
CoreText            _hotshot            gc                  scipy
Dialog              _io                 genericpath         select
DictionaryServices  _json               gensuitemodule      sets
DocXMLRPCServer     _locale             gestalt             setuptools
EasyDialogs         _lsprof             getopt              sgmllib
EventKit            _markerlib          getpass             sha
ExceptionHandling   _multibytecodec     gettext             shelve
Explorer            _multiprocessing    glob                shlex
FSEvents            _osx_support        grp                 shutil
FileDialog          _pyio               gzip                signal
Finder              _random             hashlib             site
FixTk               _scproxy            heapq               six
Foundation          _socket             hmac                smtpd
FrameWork           _sqlite3            hotshot             smtplib
HTMLParser          _sre                htmlentitydefs      sndhdr
IN                  _ssl                htmllib             socket
InputMethodKit      _strptime           httplib             sqlite3
InstallerPlugins    _struct             ic                  sre
InstantMessage      _symtable           icglue              sre_compile
JavaScriptCore      _sysconfigdata      icopen              sre_constants
LatentSemanticMapping _testcapi           idlelib             sre_parse
LaunchServices      _threading_local    ihooks              ssl
MacOS               _tkinter            imageop             stat
Message             _warnings           imaplib             statvfs
MimeWriter          _weakref            imghdr              string
MiniAEFrame         _weakrefset         imp                 stringold
Nav                 abc                 importlib           stringprep
Netscape            aepack              imputil             strop
OSATerminology      aetools             inspect             struct
OpenDirectory       aetypes             io                  subprocess
OpenSSL             aifc                itertools           sunau
PixMapWrapper       altgraph            json                sunaudio
PreferencePanes     antigravity         keyword             symbol
PubSub              anydbm              lib2to3             symtable
PyObjCTools         applesingle         linecache           sys
PyPDF2              appletrawmain       locale              sysconfig
QTKit               appletrunner        logging             syslog
Quartz              argparse            macerrors           tabnanny
Queue               argvemulator        macholib            tarfile
ScreenSaver         array               macostools          telnetlib
ScriptingBridge     ast                 macpath             tempfile
ScrolledText        asynchat            macresource         terminalcommand
SearchKit           asyncore            macurl2path         termios
ServiceManagement   atexit              mailbox             test
SimpleDialog        audiodev            mailcap             textwrap
SimpleHTTPServer    audioop             markupbase          this
SimpleXMLRPCServer  autoGIL             marshal             thread
Social              base64              math                threading
SocketServer        bdb                 matplotlib          time
StdSuites           bdist_mpkg          md5                 timeit
StringIO            bgenlocations       mhlib               tkColorChooser
SyncServices        binascii            mimetools           tkCommonDialog
SystemConfiguration binhex              mimetypes           tkFileDialog
SystemEvents        bisect              mimify              tkFont
Tix                 bonjour             mmap                tkMessageBox
Tkconstants         bsddb               modulefinder        tkSimpleDialog
Tkdnd               bsddb185            modulegraph         toaiff
Tkinter             buildtools          multifile           token
UserDict            bundlebuilder       multiprocessing     tokenize
UserList            bz2                 mutex               trace
UserString          cPickle             netrc               traceback
WebKit              cProfile            new                 ttk
_AE                 cStringIO           nis                 tty
_AH                 calendar            nntplib             turtle
_App                cfmfile             ntpath              types
_CF                 cgi                 nturl2path          unicodedata
_CG                 cgitb               numbers             unittest
_CarbonEvt          chunk               numpy               urllib
_Cm                 cmath               objc                urllib2
_Ctl                cmd                 olefile             urlparse
_Dlg                code                opcode              user
_Drag               codecs              operator            uu
_Evt                codeop              optparse            uuid
_File               collections         os                  videoreader
_Fm                 colorsys            os2emxpath          warnings
_Folder             commands            parser              wave
_Help               compileall          pdb                 weakref
_IBCarbon           compiler            pickle              webbrowser
_Icn                contextlib          pickletools         whichdb
_LWPCookieJar       cookielib           pimp                wsgiref
_Launch             copy                pip                 xattr
_List               copy_reg            pipes               xdrlib
_Menu               crypt               pkg_resources       xlrd
_Mlte               csv                 pkgutil             xml
_MozillaCookieJar   ctypes              platform            xmllib
_OSA                curses              plistlib            xmlrpclib
_Qd                 datetime            popen2              xxsubtype
_Qdoffs             dateutil            poplib              zipfile
_Qt                 dbhash              posix               zipimport
_Res                dbm                 posixfile           zlib
_Scrap              decimal             posixpath           zope
_Snd                difflib             pprint              

Enter any module name to get more help.  Or, type "modules spam" to search
for modules whose descriptions contain the word "spam".

>>> exit()
$

[ More Info ]

[/languages/python] permanent link

Fri, May 18, 2018 10:56 pm

Installing new packages for WinPython

To install a new package/module under WinPython, double-click on WinPython Command Prompt in the directory where you installed WinPython to open a command prompt window.

WinPython installation directory

At the command prompt window type pip install pkgname where pkgname is the name of the package you wish to install. If the package is already present, you will see the message "requirement already satisfied."

[ More Info ]

[/languages/python] permanent link

Sun, May 13, 2018 9:55 pm

WinPython - Python for Microsoft Windows

If you wish to run Python on a Microsoft Windows system, you can use WinPython. The first window you will see when run run the downloaded installation file is one for the license agreement, which notes "WinPython components are distributed as they were received from their copyright holder, under their own copyright and/or license, and without any linking with each other." WinPython itself uses the MIT license. Once you accede to the license, you will be prompted for a destination folder. By default that will be a WinPython directory created beneath the directory where you're running the downloaded file from, but you can change the location. When the installation has been completed, a window will appear where you can click on a Finish button to exit from the installation program.

[ More Info ]

[/languages/python] permanent link

Mon, Feb 19, 2018 11:24 pm

xlrd and hidden worksheets

I use the xlrd module in Python scripts to extract data from Excel workbooks. You can use the Python xlrd module to list the worksheets in a workbook and you can use the xlrd.sheet "visibility" value to determine whether a sheet is hidden and, if it is hidden, whether a user can unhide the sheet. The value should be either 0, 1, or 2 with the numbers having the following meaning:

Visibility of the sheet:

0 = visible
1 = hidden (can be unhidden by user -- Format -> Sheet -> Unhide)
2 = "very hidden" (can be unhidden only by VBA macro)

[ More Info ]

[/languages/python/excel] permanent link

Wed, Feb 14, 2018 9:31 pm

Extracting embedded Microsoft Office files from an Excel spreadsheet

I work with Excel workbooks on my MacBook Pro laptop that have embedded PowerPoint slides on some worksheets. The workbooks, which I need to review, are created by others. When I review them, I extract information from the Excel workbooks to an SQLite database with Python and also have begun extracting information embedded by Object Linking and Embedding (OLE) into files as noted in Extracting the contents of a directory in a zipfile using Python. Some of the embedded files are PowerPoint files, but when they are extracted they have a .bin extension, which I can't open in PowerPoint without changing the filename extension from .bin to .ppt. To automate the renaming process, I created a Python script, extract_embedded.py that will extract the embedded information to files in an "embedded" directory beneath the current working directory and then rename any .bin files that are PowerPoint files to have a .ppt extension. The script is shown below.

[ More Info ]

[/languages/python/excel] permanent link

Sat, Feb 10, 2018 10:55 pm

Extracting the contents of a directory in a zipfile using Python

A Microsoft Excel file with an .xlsx or .xlsm filename extension is an Office Open XML (OpenXML) zipped, XML-based file. The OpenXML format was developed by Microsoft for spreadsheets, charts, presentations and word processing documents. If you change the file extension to .zip by renaming the file, you can extract the contents of the zip file as you would with any other zip file - see Zipping and unzipping Excel xlsx files. Excel workbooks can contain other documents embedded within them using Object Linking and Embedding (OLE) technology - see Using olefile to obtain metadata from an OLE CDF V2 file. I often need to extract an embedded PowerPoint slide or Visio diagram from Excel .xlsm files, so I've been renaming the files to zip files and unzipping them as I would other zip files, but, since I want to automate the process and extract just specific embedded files for further processing within a Python script, I created the script below to extract the embedded files, which are contained within a xl/embeddings subdirectory within the .xlsm zip files. The script uses the zipfile module to deal with the zip files. Python's OS module is used to check for the existence of the destination directory and create it, if it doesn't yet exist.

#!/usr/bin/python

import os, zipfile

dirToExtract = "xl/embeddings/"
destinationDir = "embedded"
infile = raw_input("Enter zipfile: ")
archive = zipfile.ZipFile(infile)

if not os.path.exists(destinationDir):
    os.makedirs(destinationDir)

for file in archive.namelist():
    if file.startswith(dirToExtract):
        archive.extract(file, destinationDir)

The script prompts for the file to be unzipped and then extracts just the "xl/embeddings" folder and the files contained within it to a new directory it will create within the current working directory. The new directory will be named "embedded". After extracting the contents of the "xl/embeddings" directory to the newly created "embedded" folder, I had the the files below in the case of the particular .xlsm file I used for this example.

[ More Info ]

[/languages/python/excel] permanent link

Fri, Feb 09, 2018 10:04 pm

Using the Python xlrd module to list the worksheets in a workbook

To view the list of sheets in an Excel spreadsheet, I can use the xlrd module within the Python script below to obtain the list of worksheets within the workbook.

#!/usr/bin/python

import xlrd as xl

file_name = raw_input("File: ")
workbook = xl.open_workbook(file_name)
print workbook.sheet_names()

If I use the script to display the list of worksheets in a workbook named report.xlsx that has three sheets named alpha, beta, and gamma, I would see the following output:

$ ./sheetlist.py
File: report.xlsx
[u'alpha', u'beta', u'gamma']
$

[ More Info ]

[/languages/python/excel] permanent link

Sat, Feb 03, 2018 10:34 pm

Using PyInstaller to create an executable file from a Python script

If you wish to convert Python scripts to executable files that you can run on systems where Python or all of the needed dependencies for the script are not installed, one program that is available for Linux, Mac OS X, Solaris, AIX, or Microsoft Windows systems is PyInstaller. If you have the pip package manager installed, you can install PyInstaller by running the command below from the root account.

pip install pyinstaller

To then create an executable file that will run on other sytems with that same operating system, e.g., you can create an executable file on one Linux system that will run on another Linux system or create an .exe file on a Microsoft Windows system that can be ported to another Windos system, you can issue the command pyinstaller yourprogram.py.

[ More Info ]

[/languages/python] permanent link

Fri, Feb 02, 2018 11:16 pm

Extracting information from a .msg file with Python

I received a .msg file attachment to an email message I received with Microsoft Outlook for Mac, which is part of Microsoft Office 2016 on my MacBook Pro laptop. When I double-clicked on the attachment in Outlook to view the contents of the file, I saw "There is no application specified to open the document Re_ Netbond.msg."

Msg - Open 
Attachment

And also a window giving me an option to "Search App Store" with the message "Search the App Store for an application that can open this document, or choose an existing application on your computer."

[ More Info ]

[/languages/python] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo