Python for data frame editing using pandas
2013-10-16
Pandas is a library for a data frame manipulation and analysis; it gives the ability to python as R like functionality. It is for converting a CSV file into the form to import into istsos as a tutorial sensor observations. The steps are as follows
- remove the unwanted column in csv file
- filter each station’s readings
- remove duplicates
convert the date and time format into ISO standards as demo data
to read csv
import pandas df = readcsv(‘the path/CBETNAU.csv’)
to remove duplicates
STPT_ND= df.drop_duplicates(‘column heading’)
To remove specific columns, first view the available columns in the data frame then call only needed columns.
df=pa.DataFrame(df,columns=[‘needed column 1’,’needed column 2’, etc])
To save the data frames as csv
df.to_csv(‘path/file.csv’)
To select rows with specific entry
df1 = df[df.columnname == “row entry”]
To work with the date and time format of the istsos. Based on the answer
dateS = "28-08-2013 09:00" from datetime import datetime my_date = datetime.strptime(dateS, ‘%d-%m-%Y %H:%M’) my_new_string = my_date.strftime(‘%Y-%m-%dT%H:%M:%S.000000+0530’) >>>‘2013-08-28T09:00:00.000000+0530’
Another code for datetime is
from pytz import timezone from datetime import datetime India = timezone(‘Asia/Kolkata’) in_time = datetime.now(India) print in_time.strftime(‘%Y-%m-%d_%H-%M-%S’)
Date time editing using pandas, to know python packages version for example for pandas
pandas.__version__
the to_datetime to edit whole column date and time, based on
http://stackoverflow.com/questions/17134716/convert-dataframe-column-type-from-string-to-datetime and fo:r strftime
http://stackoverflow.com/questions/13999850/how-to-specify-date-format-when-using-pandas-to-csv
Dt = pa.to_datetime(df[‘Time’], format=‘%Y-%m-%dT%H:%M:%S.000000+0530’) Dt2 = df[‘Time’].apply(lambda x: x.strftime(‘%Y-%m-%dT%H:%M:%S.000000+0530’))
Setting index for the date time column
df = pa.read_csv(‘csv file’,index_col=‘Time’)