top of page

Load from csv file into MySQL table

Client’s requirements:

The R&D team decided to delete an old non-relevant data of customers in order to decrease database size. The marketing department send this data in CSV format only therefore we need to load these accounts into MySQL temporary table for the future deletes based on this table.

Solution:

Python scripts that loads an original csv file, remove duplicated accounts and non-relevant columns, loads a new data into another csv file in order to save the previous original file, update accounts IDs, creates primary key and log file at the end of the process.

 

Introduction

One of the most important tools that allows to DBA to perform tasks (that cannot be performed using MySQL) is Python programming language.

In this case, we would have to take the data in one of the columns in a CSV file, remove duplicates of client names, and load the records into a table on the server to use it in cursor that will delete records based on the account name from a column in CSV file.

Python programming language allows us using a number of libraries to load a file into a variable, manipulate file and etc.

Of course, the MySQL server itself does not allow such flexibility therefore one of the best options is to use Python script since each distro of Linux is distributed with Python version 2 installed. Using Python version 3 will require additional installation.

 

Python 2.7 script

Create a file in the relevant folder:

# touch /path_to_folder/script_name.py

Insert the code into the file:

# vim /path_to_folder/script_name.py

Update the file as executable:

# chmod 777 /full_path_to_the_script/script_name.py

Run the script in the Linux shell:

# python ./script_name.py (if you current folder is the script’s folder)

or

# python /full_path_to_the_script/script_name.py

The code:

##########################################################################################

# This script load companies name from csv file into MySQL table prod.db_cleanup_company #

##########################################################################################

# Import important libraries for the script #

import pandas

import mysql.connector

import csv

import datetime

print ""

print "Step #1"

print "Load original file, remove non relevant columns and duplicated account names."

# Insert all CSV data into variable, insert columns names into variable and remove from the list tha column's name that we woun't delete #

original_file = pandas.read_csv("/root/db_cleanup_scripts/For_Karmi.csv", header = 1, skip_blank_lines = 1, skipfooter = 0, engine='python')

columns = list(original_file.columns.values)

columns.remove('Landscape: Landscape Name')

# Delete unneccesary columns from the data variable #

for column in columns:

column_name = str(column)

original_file = original_file.drop(column_name,1)

# remove duplicates in the column #

original_file.drop_duplicates(subset = ['Landscape: Landscape Name'], inplace = True)

print "Load non duplicated account names into a new csv file."

print "The step #1 is done."

# Insert data into a new CSV file in order to prevent original file changing #

original_file.to_csv("/root/db_cleanup_scripts/Data_for_insert.csv",index = False)

# Assign values to variables for log file #

curr_datetime = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S").replace("-","").replace(" ","_").replace(":","")

start_datetime = datetime.datetime.now()

print ""

print "Step #2"

print "Create a temporary table for companies names."

# Load column from csv file #

with open("/root/db_cleanup_scripts/Data_for_insert.csv", 'r') as data_file:

reader = csv.reader(data_file, delimiter='\t')

data_list = list(reader)

#print data_list

# Connect to the local instance of MySQL #

cnx = mysql.connector.connect(user='naya', passwd='loveofmylife',database='prod')

cursor = cnx.cursor()

# Create a table with a list of companies to delete #

drop_query = ("DROP TABLE IF EXISTS prod.db_cleanup_company;")

create_query = ("CREATE TABLE prod.db_cleanup_company ("

"`id` INT(11),"

"`company_name` VARCHAR(200) DEFAULT NULL,"

"`account_name` VARCHAR(200) DEFAULT NULL,"

"`start_delete_date` DATETIME DEFAULT NULL,"

"`end_delete_date` DATETIME DEFAULT NULL,"

"`is_deleted` BOOL DEFAULT 0)"

"ENGINE = INNODB DEFAULT CHARSET = utf8;")

cursor.execute(drop_query)

cnx.commit()

cursor.execute(create_query)

cnx.commit()

print "The temporary table was created."

print "The step #2 is done."

print ""

print "Step #3"

print "Populate a temporary table from a *.csv file located in the folder /root/db_cleanup_scripts."

# Insert rows from file into a table #

for item in data_list:

cnx = mysql.connector.connect(user='naya', passwd='loveofmylife',database='prod')

cursor = cnx.cursor()

query = ("INSERT INTO prod.db_cleanup_company VALUES (%s,%s,%s,%s,%s,%s)")

company_id = None

company_name = None

start_delete_date = None

end_delete_date = None

is_deleted = 0

cursor.execute(query, (company_id,company_name,str(item).replace("['","").replace("']",""),start_delete_date,end_delete_date,is_deleted))

cnx.commit()

print "Update company's key and company's name in the `id` column related to the companies' entity."

# Update company id from companies table #

update_query = ("UPDATE prod.db_cleanup_company AS dcc "

"INNER JOIN prod.as_account AS acc "

"ON TRIM(acc.name) = TRIM(dcc.account_name) "

"INNER JOIN prod.as_company AS comp "

"ON comp.id = acc.company_id "

"SET dcc.id = comp.ID,dcc.company_name = comp.name;")

cursor.execute(update_query)

cnx.commit()

print "Deleting non relevating rows from the table."

# Delete non relevant rows #

delete_query = ("DELETE FROM prod.db_cleanup_company WHERE `id` IS NULL OR (TRIM(account_name) = '' OR account_name IS NULL);")

cursor.execute(delete_query)

cnx.commit()

# Disconnect from the MySQL instance #

cursor.close()

cnx.close()

print "The step #3 is done."

print ""

end_datetime = datetime.datetime.now()

time_delta = end_datetime - start_datetime

print "Step #4"

print "Create log file at the end of process."

with open('/root/db_cleanup_scripts/cleanup_logs/log_' + curr_datetime + '.txt', 'w') as log_file:

log_file.write("\n")

log_file.write("The script started at " + str(start_datetime) + ".\n")

log_file.write("The script ended at " + str(end_datetime) + ".\n")

log_file.write("The running time was " + str(time_delta.seconds) + " seconds.\n")

log_file.write("\n")

log_file.close()

print "The step #4 is done."

print ""

Featured Posts
Check back soon
Once posts are published, you’ll see them here.
Recent Posts
Archive
Search By Tags
No tags yet.
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page