Using Python to extract data from a spreadsheet and put it in a database

I need to review spreadsheets related to work requests on a daily basis. The Excel workbooks have multiple worksheets within them. I have been manually copying data from specific cells in one of the worksheets and pasting it into a SQLite database. I use DB Browser for SQLite to paste the information into the database. I wanted to automate the process of extracting the data from the .xlsm or .xlsx workbook files and inserting it into the SQLite database using a Python script. So I initially created a script that I could run from a Teminal window on my MacBook Pro laptop, as noted in Extracting data from cells in an Excel workbook with Python, that would just print the information to the Terminal window's command-line interface. I had another script to import data from a text file into an SQLite database with Python, so I combined code from that script with the one to read data from the spreadsheets to add the data to a table in the database.

The SQLite database is named CRQ, since it holds Change Request (CRQ) information - the work requests are called "change requests." Within the database are several tables , one of which itself is named CRQ. The Python code to extract the data from a spreadsheet and insert it into the SQLite database is included below. There should only be one entry for each CRQ in the CRQ table, though a CRQ number can occur multiple times in a separate Tasks table in the database. I have a CRQ column in the CRQ table that holds the CRQ numbers. That CRQ field is specified as a "unique" field, which indicates the value can only occur once in the field. I use the following lines of code to first check to see if the information for the CRQ is already in the table, since I may sometimes be reviewing an updated version of the spreadsheet with the pertinent information I want in the database already present in the CRQ table in the database.

   conn = sqlite3.connect('/Users/jasmith1/Documents/www/CRQ/CRQ.db')
   cursor = conn.cursor()
   cursor.execute("SELECT * FROM CRQ WHERE CRQ=?", (crqnum,))
   found = cursor.fetchone()
   if found:
      print "CRQ is already in the database"

That code is in a Python function that has access to global variables in the program that hold the values I want to insert into a new record in the table. In this case, the variable I'm interested in is named crqnum. I can use an SQL SELECT statement to determine if there is an existing record with that CRQ number. To use the variable, I can insert a question mark in the statement and then the variable name within parentheses after it. A comma is needed after the crqnum variable name, i.e., "crqnum," for tuples with one element to differentiate them from a plain expression - see Python sqlite3 string variable in execute.

If there is no record in the database with that CRQ number, then I can use cursor.execute('INSERT INTO CRQ (CRQ,Project,Sites,Summary,Description,"Date of Request","Target Date",Expedite) VALUES (?,?,?,?,?,?,?,?)', (crqnum,project,location,summary,description,RDate,TDate,expedite)) to add a new entry into the table with the values derived from the spreadsheet. Some of the columns in the table have a space in the name, e.g. Date of Request and Target Date. For those variables, I need to enclose the variable name within double quotes. For the VALUES that will be inserted in the record, I can put a question mark for each one within parentheses and then within parentheses after those include the variable names within parentheses. I need 11 question marks since there are 11 variables.

The full code is shown below.

#!/usr/bin/python

# Extract data from CRQ workbooks

# Used to check if file exists
import datetime, os.path, sqlite3, sys

from xlrd import open_workbook,cellname,xldate

# Print values for debugging
def printValues():

   print "CRQ Number:", crqnum
   print "Project:", project
   print "Expedite:", expedite
   print "Summary:", summary
   print "Location:", location
   print "Request Date:", RDate
   print "Target Date:", TDate
   return

# Enter data into the SQLite database
def enterSQLdata():

    conn = sqlite3.connect('/Users/jasmith1/Documents/www/CRQ/CRQ.db')
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM CRQ WHERE CRQ=?", (crqnum,))
    found = cursor.fetchone()
    if found:
       print "CRQ is already in the database"
    else:
       cursor.execute('INSERT INTO CRQ (CRQ,Project,Sites,Summary,Description,"Date of Request","Target Date",Expedite) VALUES (?,?,?,?,?,?,?,?)', (crqnum,project,location,summary,description,RDate,TDate,expedite))
       print crqnum, "added to the database" 
       conn.commit()
    return 

# Check to see if the file name was entered on the command line.
# If it wasn't prompt for the file name
try:
   sys.argv[1]
except IndexError:
   spreadsheet = raw_input("Enter file name: ")
else:
   spreadsheet = sys.argv[1]

if os.path.isfile(spreadsheet):
   workbook = open_workbook(spreadsheet, "r")
   worksheet = workbook.sheet_by_name('General Information')

   # The CRQ number is taken from the name of the current working directory
   dirpath = os.getcwd()
   # Discard all but the name of the current directory from the full
   # directory path
   crqnum = os.path.basename(dirpath)

   # The summary cell is D7; column and row numbers start at 0, so the
   # row is 4 and the column is 3
   row_index = 6
   col_index = 3
   summary = worksheet.cell(row_index,col_index)
   summary = summary.value

   # The location for the work is in cell D13
   location = worksheet.cell(12,3).value

   # The requesting project is in cell D23
   project = worksheet.cell(22,3).value

   # The request description is in cell D29
   description = worksheet.cell(28,3).value

   # The date of request cell is J3
   row_index = 2
   col_index = 9
   requestDate = worksheet.cell(row_index,col_index)
   requestDate = requestDate.value
   RDate = xldate.xldate_as_datetime(requestDate, workbook.datemode)
   RDate = RDate.strftime('%Y-%m-%d')

   # The target date for the work to be implemented is in cell is J5
   targetDate = worksheet.cell(4,9)
   targetDate = targetDate.value
   TDate = xldate.xldate_as_datetime(targetDate, workbook.datemode)
   TDate = TDate.strftime('%Y-%m-%d')

   # Cell J7 holds either "Yes" or "No", which indicates whether work on
   # this request needs to be expedited or can be handled in the normal 
   # timeframe. I only want the first letter, though, i.e., "Y" or "N"
   expedite = worksheet.cell(6,9).value
   expedite = expedite[0]

   printValues()
   enterSQLdata()

else:
   print spreadsheet, " - file not found!"

I include the URL for the documentation related to the work request, even though I can normally determine it from the CRQ number alone by appending it to a base URL that is the same for all CRQs only because there are some rare occasions where I need to specify a different URL, which I can manually insert into that field in the record.

Python by Example
Python by Example
1x1 px

Related articles:

  1. DB Browser for SQLite on OS X
  2. Extracting data from cells in an Excel workbook with Python
  3. Importing data from a text file into an SQLite database with Python
  4. Running SQL commands from a text file using sqlite3