I need to review spreadsheets related to work requests on a daily basis. The Excel workbooks have multiple worksheets within them. I have been manually copying data from specific cells in one of the worksheets and pasting it into a SQLite database. I use DB Browser for SQLite to paste the information into the database. I wanted to automate the process of extracting the data from the .xlsm or .xlsx workbook files and inserting it into the SQLite database using a Python script. So I initially created a script that I could run from a Teminal window on my MacBook Pro laptop, as noted in Extracting data from cells in an Excel workbook with Python, that would just print the information to the Terminal window's command-line interface. I had another script to import data from a text file into an SQLite database with Python, so I combined code from that script with the one to read data from the spreadsheets to add the data to a table in the database.
The SQLite database is named CRQ, since it holds Change Request (CRQ) information - the work requests are called "change requests." Within the database are several tables , one of which itself is named CRQ. The Python code to extract the data from a spreadsheet and insert it into the SQLite database is included below. There should only be one entry for each CRQ in the CRQ table, though a CRQ number can occur multiple times in a separate Tasks table in the database. I have a CRQ column in the CRQ table that holds the CRQ numbers. That CRQ field is specified as a "unique" field, which indicates the value can only occur once in the field. I use the following lines of code to first check to see if the information for the CRQ is already in the table, since I may sometimes be reviewing an updated version of the spreadsheet with the pertinent information I want in the database already present in the CRQ table in the database.
conn = sqlite3.connect('/Users/jasmith1/Documents/www/CRQ/CRQ.db') cursor = conn.cursor() cursor.execute("SELECT * FROM CRQ WHERE CRQ=?", (crqnum,)) found = cursor.fetchone() if found: print "CRQ is already in the database"
That code is in a Python function that has access to
global variables in the program that hold the values I want to insert into
a new
record in the table. In this case, the variable I'm interested in is
named crqnum
. I can use an
SQL
SELECT statement to determine if there is an existing record with that
CRQ number. To use the variable, I can insert a question mark in the
statement and then the variable name within parentheses after it. A
comma is needed after the crqnum variable name, i.e., "crqnum,
"
for tuples with one element to differentiate them from a plain expression - see
Python sqlite3 string variable in execute.
If there is no record in the database with that CRQ number, then I can use
cursor.execute('INSERT INTO CRQ
(CRQ,Project,Sites,Summary,Description,"Date of Request","Target Date",Expedite)
VALUES (?,?,?,?,?,?,?,?)',
(crqnum,project,location,summary,description,RDate,TDate,expedite))
to add a new entry into the table with the values derived from the spreadsheet.
Some of the columns in the table have a space in the name, e.g. Date of
Request
and Target Date
. For those variables, I need to
enclose the variable name within double quotes. For the VALUES that will be
inserted in the record, I can put a question mark for each one within
parentheses and then within parentheses after those include the variable names
within parentheses. I need 11 question marks since there are 11 variables.
The full code is shown below.
#!/usr/bin/python # Extract data from CRQ workbooks # Used to check if file exists import datetime, os.path, sqlite3, sys from xlrd import open_workbook,cellname,xldate # Print values for debugging def printValues(): print "CRQ Number:", crqnum print "Project:", project print "Expedite:", expedite print "Summary:", summary print "Location:", location print "Request Date:", RDate print "Target Date:", TDate return # Enter data into the SQLite database def enterSQLdata(): conn = sqlite3.connect('/Users/jasmith1/Documents/www/CRQ/CRQ.db') cursor = conn.cursor() cursor.execute("SELECT * FROM CRQ WHERE CRQ=?", (crqnum,)) found = cursor.fetchone() if found: print "CRQ is already in the database" else: cursor.execute('INSERT INTO CRQ (CRQ,Project,Sites,Summary,Description,"Date of Request","Target Date",Expedite) VALUES (?,?,?,?,?,?,?,?)', (crqnum,project,location,summary,description,RDate,TDate,expedite)) print crqnum, "added to the database" conn.commit() return # Check to see if the file name was entered on the command line. # If it wasn't prompt for the file name try: sys.argv[1] except IndexError: spreadsheet = raw_input("Enter file name: ") else: spreadsheet = sys.argv[1] if os.path.isfile(spreadsheet): workbook = open_workbook(spreadsheet, "r") worksheet = workbook.sheet_by_name('General Information') # The CRQ number is taken from the name of the current working directory dirpath = os.getcwd() # Discard all but the name of the current directory from the full # directory path crqnum = os.path.basename(dirpath) # The summary cell is D7; column and row numbers start at 0, so the # row is 4 and the column is 3 row_index = 6 col_index = 3 summary = worksheet.cell(row_index,col_index) summary = summary.value # The location for the work is in cell D13 location = worksheet.cell(12,3).value # The requesting project is in cell D23 project = worksheet.cell(22,3).value # The request description is in cell D29 description = worksheet.cell(28,3).value # The date of request cell is J3 row_index = 2 col_index = 9 requestDate = worksheet.cell(row_index,col_index) requestDate = requestDate.value RDate = xldate.xldate_as_datetime(requestDate, workbook.datemode) RDate = RDate.strftime('%Y-%m-%d') # The target date for the work to be implemented is in cell is J5 targetDate = worksheet.cell(4,9) targetDate = targetDate.value TDate = xldate.xldate_as_datetime(targetDate, workbook.datemode) TDate = TDate.strftime('%Y-%m-%d') # Cell J7 holds either "Yes" or "No", which indicates whether work on # this request needs to be expedited or can be handled in the normal # timeframe. I only want the first letter, though, i.e., "Y" or "N" expedite = worksheet.cell(6,9).value expedite = expedite[0] printValues() enterSQLdata() else: print spreadsheet, " - file not found!"
I include the URL for the documentation related to the work request, even though I can normally determine it from the CRQ number alone by appending it to a base URL that is the same for all CRQs only because there are some rare occasions where I need to specify a different URL, which I can manually insert into that field in the record.
Related articles: