Nishadh KA

IStSOS Data formating using Python

2014-05-10


Use Python with SQLite, istsos data formate and timeseries subsampling (downscaling).

  1. With sqlite to convert a list, for example cur.fetch from database like sqlite,

    ['2014-04-30T10:25,2797,147', '2014-04-30T10:27,2639,174', '2014-04-30T10:29,2645,158', '2014-04-30T10:31,2676,149']
    
  2. Use print “\n”.join(b) based on this gives

    "2014-04-30T10:25,2797,147
    2014-04-30T10:27,2639,174
    2014-04-30T10:29,2645,158
    2014-04-30T10:31,2676,149"
    
  3. To remove double quotes from above to write into a.DAT, tried almost two hours then find out that the used method will not do this. The full code is as follows with uncommented lines showing failed attempts. Double # are comments

    import csv
    import sqlite3 as lite
    #import string
    #identity = string.maketrans()
    #import simplejson
    #import os
    outfile_path='GVR.DAT'
    #text_file = open("GVR.DAT", "w")
    #writer = csv.writer(open(outfile_path, 'w'), quotechar='"',quoting=csv.QUOTE_NONE)
    ##it gives extra hand for defining the csv output, sticked much in this but not given the results
    #writer = csv.writer(open(outfile_path, 'w'),dialect='excel')
    #lineterminator='\n')
    #headers=['urn:ogc:def:parameter:x-istsos:1.0:time:iso8601','urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg','urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np25mg']
    #text_file.write('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg,urn:ogc:def:parameter:x#-istsos:1.0:cbe:dylos:np25mg')
    #writer.writerow(headers)
    #text_file.close
    ##follwoing comments are for accessing sqlite
    con = lite.connect('GVR.db')
    cur = con.cursor()
    cur.execute("SELECT * FROM data")
    a= [str(i[1]) for i in cur.fetchall()]
    ##without maping it produce file with alternative line kept empty
    b = map(str.strip,a)
    #c= str(b).strip("[']") 
    #e= [str(i) for i in b]
    #d="\n".join(b)
    #print d
    #print d
    #e = map(str.strip,b)
    #e=d.translate(None, '"')
    #e = d.translate(identity, '",')
    #e = d[1:-1]
    #print "[%s]" % (','.join(e))
    #f = map(lambda x: x.strip('"'), d)
    #writer.writerow([d])
    #for item in e:
    #     writer.writerow([item,].replace'"')
    #csvwriter.writerow(JD.split())
    #e='map'
    #text_file = open("GVR.DAT", "w")
    #text_file.write(e)
    #text_file.close
    #writer = csv.writer(open("GVR.DAT", "wb"), quoting=csv.QUOTE_NONE)
    #reader = csv.reader(open("GVR.DAT", "rb"), skipinitialspace=True)
    #writer.writerows(reader)
    #for val in e:
    #        writer.writerow([val])    
    #writer.writerow(d)
    f = open('output.txt', 'w')
    f.write('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np25mg'+"\n")
    ##this simple comments alveate two double quote removal for two hours
    f.write("\n".join(a))
    #simplejson.dump(d, f)
    f.close()
    
  4. Write CSV method is not at all working, tried to write using file method and worked fine.

Istsos date formate

  1. The Istsos requires to date and time in this ( 2014-04-30T11:25:00.000000+0530) format; the data from Dylos is cut shorten to reduce size. So to edit the column it has to convert into a data frame in python.
  2. Pandas data frame conversion based on this

    col_names=["date","np05","np25"]
    data = pa.read_csv("GVR.DAT", header=None,skiprows=1, parse_dates="date",names=col_names)
    
    Read the DAT file created in the earlier step. 
    
    data["date"]=pa.to_datetime(data["date"])
    data["date"] = data['date'].apply(lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.000000+0530"))
    
    calling specific date column, then converting into desired date formate.
    

Time scaling

  1. Based on this and this has to reindex the dataframe into datetimeindex as follows

    data2 = data.set_index(pa.DatetimeIndex(data['urn:ogc:def:parameter:x-istsos:1.0:time:iso8601']))
    
  2. then using this index to resample data frame for every fifteen minutes last data, the default resampling is ‘mean.’

    data3=data2.resample('15Min', how='last')
    
  3. Then set the index to the date columns. Otherwise, it writes one extra column into a data file

    data4=data3.set_index('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601')
    
  4. then to make a file out of this data frame

    data4.to_csv('GVR1.DAT', sep=',', encoding='utf-8')