Input datasets used for analysis can be large and if all of your data is stored in oracle database, downloading them into excel, moving them across file servers and reading them into python is just, lets say, not so smart move. Also, with the number of iterations, the overload of this process is just an over kill of you precious time and not so seamlessly scalable option.
Oracle provides a seamless integration library to interact with python and it just so simple to use, all you need is awareness of its existence and the feel of how simple it is.
Required Library
import oracle_connect
Create Connection to Oracle DB

Use the details from above window to establish connection as below:
Using SID:
dsn_tns = cx_Oracle.makedsn ('<hostname>', '<port>', sid='<sid>')
Using Service Name:
dsn_tns = cx_Oracle.makedsn ('<hostname>', '<port>', service name='<service name>')
Initiate Connection
connection = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)
Read data from database table
Read all data:
import pandas as pd
ip_data = pd.read_sql ("SELECT * FROM <table name>", con=connection)
Read subset of data:
import pandas as pd
## to fetch required data from db
from_date = today-3 days
ip_data = pd.read_sql ("SELECT * FROM <table name> WHERE date_col >= to_date(' + from_date + "', format = 'yyyy-mm-dd HH24:MM:SS')", con=connection)
Note: Entire query needs to be in string format, so if you are using dynamic assignment ensure you construct a string using concatenation logic and pass it to the read.sql
Write data to database table
Single Operation
Its a row by row insertion, when writing a larger data to the DB this should be avoided for better performance, I mean a way better performance, and reducing overload with frequent requests between db and python
Cursor: Oracle creates a memory called as context area for the query request being executed and the cursor points to the area. Cursor is used to access the context area and the information on context area to fetch the information regarding the operation.
## initiate connection as above
con = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)
## set the cursor for reference
cur = con.cursor()
##letting the db know columns for which data is being passed
cols = ",".join([str(i) for i in <data_frame>.columns.tolist()])
##pass the data that needs to be inserted
##the data needs to be as list of tuple, where each tuple is a one row with column values separated by comma
row_val = [tuple(x) for x in <data_frame>.values]
## execute the sql query and writes the data to db
for i in range(0,len(row_val)-1):
##print statement helps to track progress and where error occurred
print(str(row_val[i][0])+'-'+str(i))
cur.execute("insert into <table_name> (" + cols + ") values
"+str(row_val[i]))
##safely close the connection
##without closing the connection, DB will still be wait mode,
##unable to process any other requests till it is automatically
##terminated after a few mins
con.commit()
cur.close()
con.close()
Batch Operation:
## initiate connection as above
connection = cx_Oracle.connect(user='<username>', password='<password>', dsn=dsn_tns)
## set the cursor for reference
cursor = connection.cursor()
##construct SQL query
## requires passing the columns in df that needs to be updated and letting the db know columns for which data is being passed
cols = ",".join([str(i) for i in <data_frame>.columns.tolist()])
vals = ",:".join([str(i) for i in <data_frame>.columns.tolist()])
sql = ('insert into <table_name> ('+cols+') '
'values(:'+vals+')')
##pass the data that needs to be inserted
##the data needs to be as list of tuple, where each tuple is a one row with column values separated by comma
row_val = [tuple(x) for x in <data_frame>.values]
## execute the sql query and writes the data to db
cursor.executemany(sql, row_val)
##safely close the connection
##without closing the connection, DB will still be wait mode,
##unable to process any other requests till it is automatically
##terminated after a few mins
connection.commit()
cursor.close()
connection.close()
Writing bulk data can be tricky (i am referring to ~0.4 million records), due to the volume of data there are certain exceptions that are thrown from the db end pertaining to datatype mismatches, which perfectly ran using single operation. Honesty, I haven’t been able to figure out why, and if any of you have, will appreciate you help. However, there was a work around I got benefited from: using batch operation within try and exception and using single operation in exception statement
##try implementing batch operation
try:
##create connection
connection = cx_Oracle.connect(user='<username>',
password='<password>', dsn=dsn_tns)
cursor = connection.cursor()
##construct values to write to db and execute query
row_val = [tuple(x) for x in <data_frame>.values]
cursor.executemany(sql, row_val)
##close the connection
connection.commit()
cursor.close()
connection.close()
##if the batch operation fails, execute the write operation in single operation methodology
except Exception as e: ##e variable can we used for error logging
##close the previously failed batch operation initiated connection
connection.close()
##create connection
con = cx_Oracle.connect(user='<username>',
password='<password>', dsn=dsn_tns)
cur = con.cursor()
##construct values to write to db and execute query
row_val = [tuple(x) for x in <data_frame>.values]
for i in range(0,len(row_val)-1):
##print statement helps to track progress and where error occurred
print(str(row_val[i][0])+'-'+str(i))
try:
cur.execute("insert into BIM_PERFORMANCE_MONITOR (" + cols +
") values "+str(row_val[i]))
con.commit()
##if even the single operation fails, capture in log file and continue the operation, log files can be looked into for analysis
except Exception as e:
excep_log = open(r'<ip path D:\path\excep_log.txt>',"a")
error = str(e)
excep_log.write('Error: '+error)
excep_log.write('\n'+str(row_val[i])+'\n')
excep_log.close()
cur.close()
con.close()
This has certainly made my life easier. While write statement to Oracle DB has made my script automation seamless, read statement has allowed me to run multiple iterations without having to download and move the files between servers.