←December→
Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
|
|
|
|
|
|
Fri, Apr 19, 2019 10:01 pm
Extract images from a PDF file with Python
You can use the PyMuPDF module with
Python to extract images from a
PDF
file. You can install PyMuPDF using the
pip package
manager with the command pip install PyMuPDF
. You can determine
if it is already installed with the command pip list | grep PyMuPDF
or pip freeze | grep PyMuPDF
.
# pip list | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF 1.14.13
# pip freeze | grep PyMuPDF
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Ple
ase upgrade your Python as Python 2.7 won't be maintained after that date. A fut
ure version of pip will drop support for Python 2.7.
PyMuPDF==1.14.13
#
The code for the file is in
extract-PDF-image.py.
[ More Info ]
[/languages/python]
permanent link
Sat, Aug 18, 2018 10:16 pm
Determine Python installed modules/packages
If you need to determine the
packages/modules/
libraries installed for
Python on a system, you can do so by obtaining a Python
command prompt and issuing the help("modules")
command.
$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> help("modules")
Please wait a moment while I gather a list of all available modules...
2018-08-18 15:31:45.666 Python[74959:5231860] Cannot find executable for CFBundle 0x
7fe335602be0 </System/Library/Frameworks/Message.framework> (not loaded)
AVFoundation _TE dircache profile
Accounts _Win dis pstats
AddressBook __builtin__ distutils pty
AppKit __future__ dl pwd
AppleScriptKit _abcoll doctest py2app
AppleScriptObjC _ast dumbdbm py_compile
Audio_mac _bisect dummy_thread pyclbr
Automator _builtinSuites dummy_threading pydoc
BaseHTTPServer _codecs easy_install pydoc_data
Bastion _codecs_cn email pyexpat
CFNetwork _codecs_hk encodings pylab
CFOpenDirectory _codecs_iso2022 ensurepip pyparsing
CGIHTTPServer _codecs_jp errno pytz
Canvas _codecs_kr exceptions quopri
Carbon _codecs_tw fcntl random
Cocoa _collections filecmp re
CodeWarrior _csv fileinput readline
Collaboration _ctypes findertools repr
ColorPicker _ctypes_test fnmatch resource
ConfigParser _curses formatter rexec
Cookie _curses_panel fpformat rfc822
CoreData _elementtree fractions rlcompleter
CoreFoundation _functools ftplib robotparser
CoreGraphics _hashlib functools runpy
CoreLocation _heapq future_builtins sched
CoreText _hotshot gc scipy
Dialog _io genericpath select
DictionaryServices _json gensuitemodule sets
DocXMLRPCServer _locale gestalt setuptools
EasyDialogs _lsprof getopt sgmllib
EventKit _markerlib getpass sha
ExceptionHandling _multibytecodec gettext shelve
Explorer _multiprocessing glob shlex
FSEvents _osx_support grp shutil
FileDialog _pyio gzip signal
Finder _random hashlib site
FixTk _scproxy heapq six
Foundation _socket hmac smtpd
FrameWork _sqlite3 hotshot smtplib
HTMLParser _sre htmlentitydefs sndhdr
IN _ssl htmllib socket
InputMethodKit _strptime httplib sqlite3
InstallerPlugins _struct ic sre
InstantMessage _symtable icglue sre_compile
JavaScriptCore _sysconfigdata icopen sre_constants
LatentSemanticMapping _testcapi idlelib sre_parse
LaunchServices _threading_local ihooks ssl
MacOS _tkinter imageop stat
Message _warnings imaplib statvfs
MimeWriter _weakref imghdr string
MiniAEFrame _weakrefset imp stringold
Nav abc importlib stringprep
Netscape aepack imputil strop
OSATerminology aetools inspect struct
OpenDirectory aetypes io subprocess
OpenSSL aifc itertools sunau
PixMapWrapper altgraph json sunaudio
PreferencePanes antigravity keyword symbol
PubSub anydbm lib2to3 symtable
PyObjCTools applesingle linecache sys
PyPDF2 appletrawmain locale sysconfig
QTKit appletrunner logging syslog
Quartz argparse macerrors tabnanny
Queue argvemulator macholib tarfile
ScreenSaver array macostools telnetlib
ScriptingBridge ast macpath tempfile
ScrolledText asynchat macresource terminalcommand
SearchKit asyncore macurl2path termios
ServiceManagement atexit mailbox test
SimpleDialog audiodev mailcap textwrap
SimpleHTTPServer audioop markupbase this
SimpleXMLRPCServer autoGIL marshal thread
Social base64 math threading
SocketServer bdb matplotlib time
StdSuites bdist_mpkg md5 timeit
StringIO bgenlocations mhlib tkColorChooser
SyncServices binascii mimetools tkCommonDialog
SystemConfiguration binhex mimetypes tkFileDialog
SystemEvents bisect mimify tkFont
Tix bonjour mmap tkMessageBox
Tkconstants bsddb modulefinder tkSimpleDialog
Tkdnd bsddb185 modulegraph toaiff
Tkinter buildtools multifile token
UserDict bundlebuilder multiprocessing tokenize
UserList bz2 mutex trace
UserString cPickle netrc traceback
WebKit cProfile new ttk
_AE cStringIO nis tty
_AH calendar nntplib turtle
_App cfmfile ntpath types
_CF cgi nturl2path unicodedata
_CG cgitb numbers unittest
_CarbonEvt chunk numpy urllib
_Cm cmath objc urllib2
_Ctl cmd olefile urlparse
_Dlg code opcode user
_Drag codecs operator uu
_Evt codeop optparse uuid
_File collections os videoreader
_Fm colorsys os2emxpath warnings
_Folder commands parser wave
_Help compileall pdb weakref
_IBCarbon compiler pickle webbrowser
_Icn contextlib pickletools whichdb
_LWPCookieJar cookielib pimp wsgiref
_Launch copy pip xattr
_List copy_reg pipes xdrlib
_Menu crypt pkg_resources xlrd
_Mlte csv pkgutil xml
_MozillaCookieJar ctypes platform xmllib
_OSA curses plistlib xmlrpclib
_Qd datetime popen2 xxsubtype
_Qdoffs dateutil poplib zipfile
_Qt dbhash posix zipimport
_Res dbm posixfile zlib
_Scrap decimal posixpath zope
_Snd difflib pprint
Enter any module name to get more help. Or, type "modules spam" to search
for modules whose descriptions contain the word "spam".
>>> exit()
$
[ More Info ]
[/languages/python]
permanent link
Fri, May 18, 2018 10:56 pm
Installing new packages for WinPython
To install a new package/module under
WinPython, double-click on WinPython Command Prompt
in
the directory where you installed WinPython to open a command prompt window.
At the command prompt window type pip install pkgname
where pkgname is the name of the package you wish to install. If the
package is already present, you will see the message "requirement already
satisfied."
[ More Info ]
[/languages/python]
permanent link
Sun, May 13, 2018 9:55 pm
WinPython - Python for Microsoft Windows
If you wish to run
Python on a
Microsoft Windows system, you can use
WinPython.
The first window you will see when run run the downloaded installation file is
one for the license agreement, which notes
"WinPython components are distributed as they were received from their
copyright holder, under their own copyright and/or license, and without any
linking with each other." WinPython itself uses the
MIT
license. Once you accede to the license, you will be prompted for a
destination folder. By default that will be a WinPython directory created
beneath the directory where you're running the downloaded file from,
but you can change the location. When the installation has been
completed, a window will appear where you can click on a Finish
button to exit from the installation program.
[ More Info ]
[/languages/python]
permanent link
Mon, Feb 19, 2018 11:24 pm
xlrd and hidden worksheets
I use the xlrd
module in Python scripts to extract data from Excel workbooks. You can
use the Python xlrd module to
list the worksheets in a workbook and you can use the
xlrd.sheet "visibility" value to determine whether a sheet is hidden and,
if it is hidden, whether a user can unhide the sheet. The value should be
either 0, 1, or 2 with the numbers having the following meaning:
Visibility of the sheet:
0 = visible
1 = hidden (can be unhidden by user -- Format -> Sheet -> Unhide)
2 = "very hidden" (can be unhidden only by VBA macro)
[ More Info ]
[/languages/python/excel]
permanent link
Wed, Feb 14, 2018 9:31 pm
Extracting embedded Microsoft Office files from an Excel spreadsheet
I work with Excel workbooks on my
MacBook Pro
laptop that have embedded
PowerPoint
slides on some worksheets. The workbooks, which I need to review, are
created by others. When I review them, I
extract information from
the Excel workbooks to an SQLite database with Python and also have begun
extracting information embedded by
Object Linking and Embedding (OLE) into files as
noted in Extracting the
contents of a directory in a zipfile using Python. Some of the
embedded files are PowerPoint files, but when they are extracted they
have a .bin extension, which I can't open in
PowerPoint without changing the
filename extension from .bin to .ppt. To automate the renaming process,
I created a Python script, extract_embedded.py
that will extract
the embedded information to files in an "embedded" directory beneath the
current working directory and then rename any .bin files that are PowerPoint
files to have a .ppt extension. The script is shown below.
[ More Info ]
[/languages/python/excel]
permanent link
Sat, Feb 10, 2018 10:55 pm
Extracting the contents of a directory in a zipfile using Python
A Microsoft Excel file with an .xlsx or .xlsm
filename extension is an
Office
Open XML (OpenXML) zipped, XML-based file. The OpenXML format was developed by Microsoft for
spreadsheets, charts, presentations and word processing documents. If you
change the file extension to .zip by renaming the file, you can
extract the contents of the zip file as you would with any other
zip file - see Zipping
and unzipping Excel xlsx files. Excel workbooks can contain other documents
embedded within them using
Object Linking and Embedding (OLE) technology - see
Using olefile to
obtain metadata from an OLE CDF V2 file. I often need to extract an
embedded PowerPoint slide or
Visio
diagram from Excel .xlsm files, so I've been renaming the files to
zip files and unzipping them as I would other zip files, but, since
I want to automate the process and extract just specific embedded
files for further processing within a Python script, I created the
script below to extract the embedded files, which are contained
within a xl/embeddings
subdirectory within the .xlsm
zip files. The script uses the
zipfile module
to deal with the zip files.
Python's
OS module is used to check for the existence of the destination
directory and create it, if it doesn't yet exist.
#!/usr/bin/python
import os, zipfile
dirToExtract = "xl/embeddings/"
destinationDir = "embedded"
infile = raw_input("Enter zipfile: ")
archive = zipfile.ZipFile(infile)
if not os.path.exists(destinationDir):
os.makedirs(destinationDir)
for file in archive.namelist():
if file.startswith(dirToExtract):
archive.extract(file, destinationDir)
The script prompts for the file to be unzipped and then extracts just
the "xl/embeddings" folder and the files contained within it to a new
directory it will create within the current working directory. The new
directory will be named "embedded". After extracting the contents of the
"xl/embeddings" directory to the newly created "embedded" folder, I had the
the files below in the case of the particular .xlsm file I used for this
example.
[ More Info ]
[/languages/python/excel]
permanent link
Fri, Feb 09, 2018 10:04 pm
Using the Python xlrd module to list the worksheets in a workbook
To view the list of sheets in an
Excel
spreadsheet, I can use the
xlrd module within the
Python script below to obtain the list of
worksheets within the workbook.
#!/usr/bin/python
import xlrd as xl
file_name = raw_input("File: ")
workbook = xl.open_workbook(file_name)
print workbook.sheet_names()
If I use the script to display the list of worksheets in a workbook
named report.xlsx
that has three sheets named alpha,
beta, and gamma, I would see the following output:
$ ./sheetlist.py
File: report.xlsx
[u'alpha', u'beta', u'gamma']
$
[ More Info ]
[/languages/python/excel]
permanent link
Sat, Feb 03, 2018 10:34 pm
Using PyInstaller to create an executable file from a Python script
If you wish to convert Python scripts to executable files that you can run on
systems where Python or all of the needed dependencies for the script are
not installed, one program that is available for
Linux, Mac OS X,
Solaris, AIX, or
Microsoft Windows systems is
PyInstaller. If you have the
pip package manager installed, you can install PyInstaller by running the
command below from the root account.
To then create an executable file that will run on other sytems with that
same operating system, e.g., you can create an executable file on one Linux
system that will run on another Linux system or create an .exe file on a
Microsoft Windows system that can be ported to another Windos system, you can
issue the command pyinstaller yourprogram.py
.
[ More Info ]
[/languages/python]
permanent link
Fri, Feb 02, 2018 11:16 pm
Extracting information from a .msg file with Python
I received a
.msg file attachment to an email message I received with
Microsoft Outlook for Mac, which is part of
Microsoft Office 2016 on my
MacBook Pro
laptop. When I double-clicked on the attachment in Outlook to view
the contents of the file, I saw "There is no application specified to open
the document Re_ Netbond.msg."
And also a window giving me an option to "Search App Store" with the message
"Search the App Store for an application that can open this document, or
choose an existing application on your computer."
[ More Info ]
[/languages/python]
permanent link
Privacy Policy
Contact