Tuesday, October 2, 2012

Spitting out CSV data in Python

Something I do frequently in python is to spit out data into a CSV file. I do it so often, I created a simple function to copy/paste in my scripts.

The data must be in specific format, though it is a simple and fairly logical one that happens to match what the CSV functions built into Python are designed around. Each row of data should be in a Python dictionary, and the rows are held in a list:

row1 = {'FIRST':'alpha', 'SECOND':'beta', 'THIRD':'gamma'}
row2 = {'FIRST':'aleph', 'SECOND':'bet', 'THIRD':'gimel'}
row3 = {'FIRST':'contains', 'SECOND':'all', 'THIRD':'of our fields'}
row4 = {'FIRST':'missing', 'THIRD':'fields'}
row5 = {'FIRST':'has', 'SECOND':'extra', 'THIRD':'field', 'FOURTH':'included'}
rows = [row1,row2,row3,row4,row5]
headers = ['FIRST','SECOND','THIRD']


Our data is essentially a list of dictionaries. Note that row4 has a missing field and row5 has a field that is not included in all the others, and which we will want to ignore in our output. We have a list called headers which is the list of headers we want to see in the final output, in the order we want to see them.

This function will spit out the CSV file with the rows in the array order and with the desired headers:

from csv import DictWriter

def spit_out_csv(columns,list_of_data_rows,filename):
    with open(filename,'w') as outfile:
    outcsv = DictWriter(outfile,columns)
    outcsv.writeheader()
    for row in list_of_data_rows:
        truncated_row = {}
        for column in columns:
        truncated_row[column] = row.get(column,'').strip()
        outcsv.writerow(truncated_row)

To use it with the above data and spit out a file called output.csv, call it as follows:

spit_out_csv(headers,rows,'output.csv')

This will create the desired file, which will look like this if you crack it open:

FIRST,SECOND,THIRD
alpha,beta,gamma
aleph,bet,gimel
contains,all,"of our fields"
missing,,fields
has,extra,field

No comments:

Post a Comment