IStSOS Data formating using Python

2014-05-10

Use Python with SQLite, istsos data formate and timeseries subsampling (downscaling).

With sqlite to convert a list, for example cur.fetch from database like sqlite,

['2014-04-30T10:25,2797,147', '2014-04-30T10:27,2639,174', '2014-04-30T10:29,2645,158', '2014-04-30T10:31,2676,149']

Use print “\n”.join(b) based on this gives

"2014-04-30T10:25,2797,147
2014-04-30T10:27,2639,174
2014-04-30T10:29,2645,158
2014-04-30T10:31,2676,149"

To remove double quotes from above to write into a.DAT, tried almost two hours then find out that the used method will not do this. The full code is as follows with uncommented lines showing failed attempts. Double # are comments

import csv
import sqlite3 as lite
#import string
#identity = string.maketrans()
#import simplejson
#import os
outfile_path='GVR.DAT'
#text_file = open("GVR.DAT", "w")
#writer = csv.writer(open(outfile_path, 'w'), quotechar='"',quoting=csv.QUOTE_NONE)
##it gives extra hand for defining the csv output, sticked much in this but not given the results
#writer = csv.writer(open(outfile_path, 'w'),dialect='excel')
#lineterminator='\n')
#headers=['urn:ogc:def:parameter:x-istsos:1.0:time:iso8601','urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg','urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np25mg']
#text_file.write('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg,urn:ogc:def:parameter:x#-istsos:1.0:cbe:dylos:np25mg')
#writer.writerow(headers)
#text_file.close
##follwoing comments are for accessing sqlite
con = lite.connect('GVR.db')
cur = con.cursor()
cur.execute("SELECT * FROM data")
a= [str(i[1]) for i in cur.fetchall()]
##without maping it produce file with alternative line kept empty
b = map(str.strip,a)
#c= str(b).strip("[']") 
#e= [str(i) for i in b]
#d="\n".join(b)
#print d
#print d
#e = map(str.strip,b)
#e=d.translate(None, '"')
#e = d.translate(identity, '",')
#e = d[1:-1]
#print "[%s]" % (','.join(e))
#f = map(lambda x: x.strip('"'), d)
#writer.writerow([d])
#for item in e:
#     writer.writerow([item,].replace'"')
#csvwriter.writerow(JD.split())
#e='map'
#text_file = open("GVR.DAT", "w")
#text_file.write(e)
#text_file.close
#writer = csv.writer(open("GVR.DAT", "wb"), quoting=csv.QUOTE_NONE)
#reader = csv.reader(open("GVR.DAT", "rb"), skipinitialspace=True)
#writer.writerows(reader)
#for val in e:
#        writer.writerow([val])    
#writer.writerow(d)
f = open('output.txt', 'w')
f.write('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np05mg,urn:ogc:def:parameter:x-istsos:1.0:cbe:dylos:np25mg'+"\n")
##this simple comments alveate two double quote removal for two hours
f.write("\n".join(a))
#simplejson.dump(d, f)
f.close()

Write CSV method is not at all working, tried to write using file method and worked fine.

Istsos date formate

The Istsos requires to date and time in this ( 2014-04-30T11:25:00.000000+0530) format; the data from Dylos is cut shorten to reduce size. So to edit the column it has to convert into a data frame in python.

Pandas data frame conversion based on this

col_names=["date","np05","np25"]
data = pa.read_csv("GVR.DAT", header=None,skiprows=1, parse_dates="date",names=col_names)

Read the DAT file created in the earlier step. 

data["date"]=pa.to_datetime(data["date"])
data["date"] = data['date'].apply(lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.000000+0530"))

calling specific date column, then converting into desired date formate.

Time scaling

Based on this and this has to reindex the dataframe into datetimeindex as follows

data2 = data.set_index(pa.DatetimeIndex(data['urn:ogc:def:parameter:x-istsos:1.0:time:iso8601']))

then using this index to resample data frame for every fifteen minutes last data, the default resampling is ‘mean.’
```
data3=data2.resample('15Min', how='last')
```
Then set the index to the date columns. Otherwise, it writes one extra column into a data file
```
data4=data3.set_index('urn:ogc:def:parameter:x-istsos:1.0:time:iso8601')
```

then to make a file out of this data frame

data4.to_csv('GVR1.DAT', sep=',', encoding='utf-8')