Handling Oracle DB data using Python

Input datasets used for analysis can be large and if all of your data is stored in oracle database, downloading them into excel, moving them across file servers and reading them into python is just, lets say, not so smart move. Also, with the number of iterations, the overload of this process is just an over kill of you precious time and not so seamlessly scalable option.

Oracle provides a seamless integration library to interact with python and it just so simple to use, all you need is awareness of its existence and the feel of how simple it is.

Required Library

import oracle_connect

Create Connection to Oracle DB

Database Table Properties

Use the details from above window to establish connection as below:

Using SID:

dsn_tns = cx_Oracle.makedsn ('<hostname>', '<port>', sid='<sid>')

Using Service Name:

dsn_tns = cx_Oracle.makedsn ('<hostname>', '<port>', service name='<service name>')

Initiate Connection

connection = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)

Read data from database table

Read all data:

import pandas as pd

ip_data = pd.read_sql ("SELECT * FROM <table name>", con=connection)

Read subset of data:

import pandas as pd

## to fetch required data from db
from_date = today-3 days

ip_data = pd.read_sql ("SELECT * FROM <table name> WHERE date_col >= to_date(' + from_date + "', format = 'yyyy-mm-dd HH24:MM:SS')", con=connection)

Note: Entire query needs to be in string format, so if you are using dynamic assignment ensure you construct a string using concatenation logic and pass it to the read.sql

Write data to database table

Single Operation

Its a row by row insertion, when writing a larger data to the DB this should be avoided for better performance, I mean a way better performance, and reducing overload with frequent requests between db and python

Cursor: Oracle creates a memory called as context area for the query request being executed and the cursor points to the area. Cursor is used to access the context area and the information on context area to fetch the information regarding the operation.

## initiate connection as above
con = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)

## set the cursor for reference 
cur = con.cursor()

##letting the db know columns for which data is being passed 
cols = ",".join([str(i) for i in <data_frame>.columns.tolist()])

##pass the data that needs to be inserted
##the data needs to be as list of tuple, where each tuple is a one row with column values separated by comma
row_val = [tuple(x) for x in <data_frame>.values]

## execute the sql query and writes the data to db 
for i in range(0,len(row_val)-1):
##print statement helps to track progress and where error occurred
  print(str(row_val[i][0])+'-'+str(i)) 
  cur.execute("insert into <table_name> (" + cols + ") values 
  "+str(row_val[i]))

##safely close the connection
##without closing the connection, DB will still be wait mode,
##unable to process any other requests till it is automatically
##terminated after a few mins  
con.commit()
cur.close()
con.close()

Batch Operation:

## initiate connection as above
connection = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)

## set the cursor for reference 
cursor = connection.cursor()

##construct SQL query 
## requires passing the columns in df that needs to be updated and letting the db know columns for which data is being passed 
cols = ",".join([str(i) for i in <data_frame>.columns.tolist()])
vals = ",:".join([str(i) for i in <data_frame>.columns.tolist()])
sql = ('insert into <table_name> ('+cols+') '
           'values(:'+vals+')')

##pass the data that needs to be inserted
##the data needs to be as list of tuple, where each tuple is a one row with column values separated by comma
row_val = [tuple(x) for x in <data_frame>.values]

## execute the sql query and writes the data to db 
cursor.executemany(sql, row_val)

##safely close the connection
##without closing the connection, DB will still be wait mode,
##unable to process any other requests till it is automatically
##terminated after a few mins  
connection.commit()   
cursor.close()
connection.close()

Writing bulk data can be tricky (i am referring to ~0.4 million records), due to the volume of data there are certain exceptions that are thrown from the db end pertaining to datatype mismatches, which perfectly ran using single operation. Honesty, I haven’t been able to figure out why, and if any of you have, will appreciate you help. However, there was a work around I got benefited from: using batch operation within try and exception and using single operation in exception statement

##try implementing batch operation 
try:
##create connection
  connection = cx_Oracle.connect(user='<username>', 
  password='<password>', dsn=dsn_tns)            
  cursor = connection.cursor()

##construct values to write to db and execute query 
  row_val = [tuple(x) for x in <data_frame>.values]
  cursor.executemany(sql, row_val)

##close the connection 
  connection.commit()   
  cursor.close()
  connection.close()

##if the batch operation fails, execute the write operation in single operation methodology 
except Exception as e: ##e variable can we used for error logging 
##close the previously failed batch operation initiated connection   
connection.close()

##create connection
  con = cx_Oracle.connect(user='<username>', 
  password='<password>', dsn=dsn_tns)            
  cur = con.cursor()

##construct values to write to db and execute query 
  row_val = [tuple(x) for x in <data_frame>.values]

  for i in range(0,len(row_val)-1):
##print statement helps to track progress and where error occurred
    print(str(row_val[i][0])+'-'+str(i)) 
    try:
      cur.execute("insert into BIM_PERFORMANCE_MONITOR (" + cols + 
      ") values "+str(row_val[i]))
      con.commit()
##if even the single operation fails, capture in log file and continue the operation, log files can be looked into for analysis
     except Exception as e:
        excep_log = open(r'<ip path D:\path\excep_log.txt>',"a")
        error = str(e)
        excep_log.write('Error: '+error)
        excep_log.write('\n'+str(row_val[i])+'\n')
        excep_log.close()
        cur.close()
        con.close()

This has certainly made my life easier. While write statement to Oracle DB has made my script automation seamless, read statement has allowed me to run multiple iterations without having to download and move the files between servers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s