Nishadh KA

Python for data frame editing using pandas

2013-10-16


Pandas is a library for a data frame manipulation and analysis; it gives the ability to python as R like functionality. It is for converting a CSV file into the form to import into istsos as a tutorial sensor observations. The steps are as follows

  1. remove the unwanted column in csv file
  2. filter each station’s readings
  3. remove duplicates
  4. convert the date and time format into ISO standards as demo data

  5. to read csv

    import pandas  df = readcsv(‘the path/CBETNAU.csv’)

  6. to remove duplicates

    STPT_ND= df.drop_duplicates(‘column heading’)
    
  7. To remove specific columns, first view the available columns in the data frame then call only needed columns.

    df=pa.DataFrame(df,columns=[‘needed column 1’,’needed column 2’, etc])
    
  8. To save the data frames as csv

    df.to_csv(‘path/file.csv’)
    
  9. To select rows with specific entry 

    df1 = df[df.columnname == “row entry”]
    
  10. To work with the date and time format of the istsos. Based on the answer

    dateS = "28-08-2013 09:00" 
    from datetime import datetime 
    my_date = datetime.strptime(dateS, ‘%d-%m-%Y %H:%M’)
    my_new_string = my_date.strftime(‘%Y-%m-%dT%H:%M:%S.000000+0530’)
    >>>‘2013-08-28T09:00:00.000000+0530’
    
    1. http://stackoverflow.com/questions/9632336/change-string-containing-datetime-to-another-format-of-datetime
    2. http://stackoverflow.com/questions/1398674/python-display-the-time-in-a-different-time-zone
  11. Another code for datetime is

    from pytz import timezone
    from datetime import datetime
    India = timezone(‘Asia/Kolkata’)
    in_time = datetime.now(India)
    print in_time.strftime(‘%Y-%m-%d_%H-%M-%S’)
    
  12. Date time editing using pandas, to know python packages version for example for pandas pandas.__version__

  13. the to_datetime to edit whole column date and time, based on

  14. http://stackoverflow.com/questions/17134716/convert-dataframe-column-type-from-string-to-datetime and fo:r strftime

  15. http://stackoverflow.com/questions/13999850/how-to-specify-date-format-when-using-pandas-to-csv

    Dt = pa.to_datetime(df[‘Time’], format=‘%Y-%m-%dT%H:%M:%S.000000+0530’)
    Dt2 = df[‘Time’].apply(lambda x: x.strftime(‘%Y-%m-%dT%H:%M:%S.000000+0530’))
    
  16. Setting index for the date time column df = pa.read_csv(‘csv file’,index_col=‘Time’)