To work with comma-separated value (CSV) files in a
Python script, include the command import csv
in the script. I also include the os.path
module to use for
testing whether a file already exists that will hold the data, the
re
module to perform
regular expression parsing of lines in the downloaded webpage,
the sys
module for checking the command line
arguments to the script, and the datetime
module for determining the current date and formatting it in YYY-MM-DD
format, where "YYYY" is the year, "MM" the month, and "DD" the day of the
month, as explained in Python -
Checking times and dates, for writing the date to the output file. Those
modules are included with the lines below:
import csv, os.path, re, sys from datetime import datetime
I modified the script I was using previously to now expect two arguments on the command line when the script is executed. The first argument is the name of the file holding the downloaded web page and the second is the CSV file to be used to store the counts.
try: sys.argv[1] except IndexError: print "Error - missing input file name! Usage ./count_queued.py infile outfile" sys.exit(1) else: requestfile = sys.argv[1] try: sys.argv[2] except IndexError: print "Error - missing output file name! Usage ./count_queued.py infile outfile" sys.exit(1) else: countfile = sys.argv[2]
I use a Python
dictionary I named sections
to hold the counts for the
various states a request can be in and count the items in the various
queues as explained at
Counting queued items with a Python script. To write the counts
to a CSV file, I use the following code:
# Check on whether file exists and is accessible if not os.path.isfile(countfile): f = open(countfile, "w") f.write("Date,Pending Approval,Modified,Pending Removal,On Hold,Clarification Required,Waiting Implementation,Waiting Removal,Total\n") f.close() with open(countfile,'a') as csvfile: countfile_writer = csv.writer(csvfile, delimiter=',') # Set the date to be the current date in YYYY-MM-DD format theDate = str(datetime.now())[0:10] countfile_writer.writerow([theDate, sections["Pending-Approval"], sections[" Modified"], sections["Pending-Removal"], sections["On-Hold"], sections["Clarific ation-Required"], sections["Waiting-Implementation"], sections["Waiting-Removal" ], total]) f.close()
If the file doesn't exist, I create a new file and write a heading row
to it and then close the file. Then, I open the CSV file in "append" mode -
see
Reading and Writing Files for an explanation of how to open a file
for reading and writing. A csv.writer
command can be used as
explained at 13.1. csv -
CSV File Reading and Writing — Python 2.7.13 documentation to
"Return a writer object responsible for converting the user’s data into
delimited strings on the given file-like object." I set the delimiter to
be the one commonly used for CSV files, the comma. I then use
countfile_writer.writerow
to write the values for each of
the dictionary entries, which are the queue names, such as "Modified", "On
Hold", etc. to a new row on the output file. All of the variables I want to
write to the output file are enclosed in
square
brackets, i.e., countfile_writer.writerow([variables])
.
The full code for the file is in count_queued_1-2.py and is also shown below:
#!/usr/bin/python # Version: 1.2 # Created: 2017-04-04 # Last modified: 2017-04-11 # Purpose: Count the number of requests in the various queues, e.g., # modified, awaiting implementation, etc. import csv, os.path, re, sys from datetime import datetime section="" sections={} sections["Pending-Approval"] = 0 sections["Modified"] = 0 sections["Pending-Removal"] = 0 sections["On-Hold"] = 0 sections["Clarification-Required"] = 0 sections["Waiting-Implementation"] = 0 sections["Waiting-Removal"] = 0 try: sys.argv[1] except IndexError: print "Error - missing input file name! Usage ./count_queued.py infile outfile" sys.exit(1) else: requestfile = sys.argv[1] try: sys.argv[2] except IndexError: print "Error - missing output file name! Usage ./count_queued.py infile outfile" sys.exit(1) else: countfile = sys.argv[2] with open(requestfile, "r") as f: # Sections are begun by a line like: # <div class="standardbold" id="Pending-Approval">Requests Pending Approval:</div> for line in f: if "standardbold" in line: section = re.search('standardbold" id="(.+)">', line).group(1) else: if section is not "" and "a href=\"/Request/" in line: sections[section]+=1 f.close() print "Request Status\n" print "Pending Approval: ", sections["Pending-Approval"] print "Modified: ", sections["Modified"] print "Pending Removal ", sections["Pending-Removal"] print "On Hold: ", sections["On-Hold"] print "Clarification Required:", sections["Clarification-Required"] print "Waiting Implementation:", sections["Waiting-Implementation"] print "Waiting Removal: ", sections["Waiting-Removal"] total = sections["Pending-Approval"] + sections["Modified"] + sections["Pending-Removal"] + sections["Clarification-Required"] print "\nTotal requiring review:", total # Check on whether file exists and is accessible if not os.path.isfile(countfile): f = open(countfile, "w") f.write("Date,Pending Approval,Modified,Pending Removal,On Hold,Clarification Required,Waiting Implementation,Waiting Removal,Total\n") f.close() with open(countfile,'a') as csvfile: countfile_writer = csv.writer(csvfile, delimiter=',') # Set the date to be the current date in YYYY-MM-DD format theDate = str(datetime.now())[0:10] countfile_writer.writerow([theDate, sections["Pending-Approval"], sections["Modified"], sections["Pending-Removal"], sections["On-Hold"], sections["Clarification-Required"], sections["Waiting-Implementation"], sections["Waiting-Removal"], total]) f.close()
Related articles: