Geonames Pandas Shapefile

2014-11-28

###GEONAMESPandasintoSHape file#### ####For the scripts /script/ /scripts/csvtoshp.py, csvtoshpTEST.py, Geoname.py#### 1. The industry data for emission inventory is in address with its details. To get get the latitude and longitude value of each address has to have a database with address with its latitude and longitude. 1. One such database is Geonames, its country wise data is having smaller amount of data for Coimbatore case, but POSTAL CODE data has more than 641 postal code details with latitude and longitude information. 1. To get that data into pandas dataframe used following script. Faced a problem in specifying delimiter, it has to be mentioned as ‘\t’ for specifying space delimited data

db=pd.read_csv('IN.txt', delimiter="\t")
db=pd.read_csv('IN.txt', delimiter="\t", header=None, names=['countryCode','postalCode','placeName','adminName1','adminCode1',
'adminName2','adminCode2','adminName3','adminCode3','latitude','longitude','accuracy'])
db2=db[db['admin name2'].str.contains("Coimbatore")]

To convert this data into shape file used following script based on this and this following script, In which the PNADAS datfarme has to converted into csv file other wise having problem of

#this gives error of unidentifuiable like this ```File "hashtable.pyx", line 388, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6517)
#KeyError: 0

#this was reproducecd in #for j,k in enumerate(db2.longitude):

a=db2.latitude[j]

print k,a

#The baove problem was solved IF I am getting the dataframe from a csv file, a problem related with pandas dataframe ````

import shapefile as shp
import pandas as pd
import csv
import os
import sys


db0=pd.read_csv('IN.txt', delimiter="\t", header=None, names=['countryCode','postalCode','placeName','adminName1','adminCode1','adminName2','adminCode2','adminName3','adminCode3','latitude','longitude','accuracy'])
db1=db0[db0['adminName2'].str.contains("Coimbatore")]
db2=db.drop_duplicates(cols='postalCode', take_last=True)

db2.to_csv('Coimbatore_postal2.csv')

db=pd.read_csv('Coimbatore_postal2.csv')
w = shp.Writer(shp.POINT)
w.autoBalance = 1 #ensures gemoetry and attributes match
w.field("postalCode","N",10)
w.field("longitude","F",10,8)
w.field("latitude","F",10,8)
w.field("placeName","C",30)
w.field("adminName1","C",30)
w.field("adminName2","C",30)
w.field("adminName3","C",30)
w.field("accuracy","C",5)
for j,k in enumerate(db.longitude):
    w.point(k,db.latitude[j]) #write the geometry
    w.record(db.postalCode[j],db.longitude[j],db.latitude[j],db.placeName[j],db.placeName[j],db.adminName1[j],db.adminName2[j],db.adminName3[j],db.accuracy[j])
out_file='Coimbatore_postalcode1.shp'
w.save(out_file)
print "Done!"

This generated shape file for 641 rows in the datafraem but with only 120 points in the shape file, latter found that most of the rows are having same latitude and longitude value, so having problem with geonames POSTAL CODE data resolution for latitude and longitude.
The final scripts are these, /scripts/csvtoshp.py, csvtoshp_TEST.py, Geoname.py