Excel Pandas LATEX PDF

2014-10-28

###CSV data into pandas and LATEX then into PDF and then print!###

To import csv file with date into pandas data frame. Pandas can parse the date column, but it won’t be correct. The date column formate has top be mentioned and parsed for this, like below based on [this], then this can be imported into pandas data frame indicating the date parser.

dateparse = lambda x: pd.datetime.strptime(x, ‘%d/%m/%y’) pro=pd.readcsv(‘PROJECTEXPNDTR1.csv’,parsedates=[1],date_parser=dateparse)

dateparse = lambda x: pd.datetime.strptime(x, ‘%m/%d/%y’) pro=pd.readcsv(‘PROJECTEXPNDTR1.csv’,parsedates=[1],date_parser=dateparse)

To concat two pandas data frame as row wise

pro0=pandas.concat([pro,pro1],axis=0)

To fill empty cells in pandas data frame with zero values.

pro2=pro0.fillna(0)

To drop unwanted columns

pro3=pro2.drop([‘Total’,‘Unnamed: 10’],axis=1)

To create new column with values summed over different columns

pro3[‘Total’]=pro3[‘Permanent Equipment’]+pro3[‘Consumables’]+pro3[‘Travel ‘]+pro3[‘Remote Deployment’]+pro3[‘Site Preparation’]+pro3[‘Other Project Works’]

To sort the dataframe with date wise

pro4=pro3.sort([‘Date’])

To set the date column as index and remove unwanted seconds values in date column

pro5=pro4.set_index([‘Date’])

To give s.no column to data frames first has to create a numpy array with the length of data frames as

import numpy count = numpy.arange(1,149,1)

then attached this array into dataframe as s.no

pro6[’S.no’] = count

To reorder the dataframe and kept s.no column as the first column based on this

cols = list(pro6.columns.values)

pro6=pro6[[’S.no’,‘Particulars’, ‘Permanent Equipment’, ‘Consumables’, ‘Travel ‘, ‘Remote Deployment’, ‘Site Preparation’, ‘Other Project Works’, ‘Total’]]

To get a total row for values, there is a way with specifying margin in the pandas pivot table, but it is giving the error. Based on this done that sum all the row in data frame by

sum_row = {col: pro6[col].sum() for col in pro6}

convert the resultant dict into data frame

sumdf = pd.DataFrame(sumrow, index=[“Total”])

drop unwanted columns from the data frame

sumrow1=sumdf.drop([‘Particulars’,’S.no’],axis=1)

Then append the data frame with old data frame

pro7=pro6.append(sum_row1)

reorder the datafarme

pro8=pro7[[’S.no’,‘Date’,‘Particulars’, ‘Permanent Equipment’, ‘Consumables’, ‘Travel ‘, ‘Remote Deployment’, ‘Site Preparation’, ‘Other Project Works’, ‘Total’]]

To convert the pandas dataframe into latex table formate based on earlier not on this topic

with open(“pro_expnd.tex”, “w”) as f: f.write(”\begin{tabular}{” + “ | “.join([“l”] * len(pro8.columns)) + “}\n”) columnLabels = [”\textbf{%s}” % label for label in pro8.columns] f.write (“%s\\\hline\n” % “ & “.join(columnLabels)) for i, row in pro8.iterrows(): f.write(” & “.join([str(x) for x in row.values]) + “ \\\n”) f.write(”\end{tabular}“) f.close()

To convert this table into pdf as latex documnet,

\documentclass{article}
\usepackage[a4paper,margin=1in,landscape]{geometry}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{blindtext}
\begin{document}
%% for adjust columns size\begin{longtable}{l | c | p{5cm} | p{2cm}| p{2cm} |p{2cm}| p{2cm}| p{2cm} |p{2cm} | p{2cm} | p{2cm}}%%
%%%python gernated tex file content column heading%%
\endhead
%%%python gernated tex file content rows%%
\end{longtable}
\end{document}

The final table doesn’t contain Date column, it is due to it is an index in pandas data frame