plot NETCDF

2014-10-31

###Plot NETCDF and create xlsx file for pyWRFChemEmiss####

The 2010 anthropogenic emission file of Asian region is available in NetCDF format. To use it with WRF CHEM, PREPCHEMSRC is limited use mainly because the emission source has to specify, and its conversion factor might not be correct. Complexity/understandability is case for similar emission converter program such as Air Emissions Processor program and SMOKE.
In this regard pyWRFChemEmiss give a more modular and easy interface to convert emission file to be used with wrf chem. This program requires emission file to be in xlsx formate as specified here.
Did a search for visualizing and convert NetCDF file. An available option is one using R based on library ncdf and another with python netcdf4. The attempt with ncdf was not fruitful due to gdal library conflict. The library python-netcdf4 is viable option is with lots of examples and use case.

#####Plotting NetCDF file##### 1. Below script is used for a start on plotting NetCDF file. Earlier not on plotting NetCDF with gdal is not working now due to some gdal library conflict. The below program is from here

            import matplotlib.pyplot as plt
            from mpl_toolkits.basemap import Basemap

            my_example_nc_file = '/home/swl-sacon-dst/Documents/GISE_2013/LAB/WRF-chem/Data/2010/MICS_Asia_BC_2010_0.25x0.25.nc'
            fh = Dataset(my_example_nc_file, mode='r')
            fh.variables
            lons = fh.variables['lon'][:]
            lats = fh.variables['lat'][:]
            tmax = fh.variables['BC_POWER'][:]
            fh.close()
            tmax_units = fh.variables['Tmax'].units
            for i,j in xrange(0, lats.size):
                ilat = int(lats[i] + 90.0)
                ilon = int(lons[i] - 0.5)
                if ilon < 179 or ilon > 181: # HACK: to avoid                 date-line wraparound problem
      array[ilat,ilon] = tmax[i,j]

Another working script to plot the NetCDF file

        import matplotlib.pyplot as plt
        from mpl_toolkits.basemap import Basemap
        from netCDF4 import Dataset
        import numpy as np
        my_example_nc_file = '/home/swl-sacon-dst/Documents/GISE_2013/LAB/WRF-chem/Data/2010/    MICS_Asia_BC_2010_0.25x0.25.nc'
        fh = Dataset(my_example_nc_file, mode='r')
        print fh.variables.keys()
        print fh.variables['BC_POWER']
        tmax = fh.variables['BC_POWER']
        tmax_units = fh.variables['BC_POWER'].units
        tmax.shape
        #this rerurns (12, 441, 560) indicating the shape of netcdf #files with gridded data each point is respected to lat #long, but given with value for BC, and 12 indicate each #month data, so the data is 3D. 
        #So to get single month data, which return into (441,560)
        tmaxA=tmax[0]
        fh.close()
        lon_0 = lons.mean()
        lat_0 = lats.mean()
        m = Basemap(width=11000000,height=7500000,
     resolution='l',projection='lcc',\
     lat_ts=20,lat_0=lat_0,lon_0=lon_0)
        #cs = m.pcolor(tmaxA)
        #above command retunrs error of ```TypeError: pcolor() takes at least 4 arguments (2 given)```
        cs = m.imshow(tmaxA)
        m.drawparallels(np.arange(-80., 81., 10.),     labels=    [1,0,0,0], fontsize=10)
        m.drawmeridians(np.arange(-180., 181., 10.), labels=[0,0,0,1], fontsize=10)
        m.drawcoastlines()
        cbar = m.colorbar(cs, location='bottom', pad="10%")
        plt.title('DJF Maximum Temperature')
        plt.show()

This script is based on a answer here and this ####To convert netcdf file into xlsx formate#### 1. to create a 1D array from 2D array netcdf value of emission inventory is tmax1DA1 = np.ravel(tmaxA) based on this. 1. to convert list of lat and long into a lat long pair a=list(itertools.product(lons,lats)) based on this 1. to lat long pair into UTM project of meters is based on this. using the wonder library UTM

                for i in a:
                         c=utm.from_latlon(i[1],i[0])
                         b.append(c)

The library utm requires lat long in the specified range, for this the emission NetCDF file has to be a subset, it is based on this and this more comprehensive page on netcdf4. Based on this it is found that the netcdf can be converted into numpy array and it can be subset by calling the position based numpy array query. The method followed is to view the lat or loin value is netcdf file and creat and dump array based on the lat long values from this it cna be count the likely position of required lat long to be subset, that way the emission concentration also can be subset

        count =  numpy.arange(40.125,110.125,0.25)
        len(count)
        lon1 = ncv['lon'][0:281]
        count2 =  numpy.arange(-10.125,75.125,0.25)
        len(count2)
        lat1 = ncv['lat'][40:382]
        tair = ncv['BC_RESIDENTIAL'][itime,0:281,40:382]
        tmax1DA1 = numpy.ravel(tair)
        a=list(itertools.product(lon1,lat1))
        here len(tmax1DA1) and len(a) will be equal, in this case it is  96102, this value is exceeds MS excel row limtation of 65,536 as per [this](http://office.microsoft.com/en-in/excel-help/excel-specifications-and-limits-HP005199291.aspx), so has to reduce the size of domain.

this way the arrays such as lat lon value, UTM convert x and y values, emission concentration can be retrieved. This values are required by the program pyWRFChemEmiss. Now the job is to make csv or xlsx file from these arrays.

Pandas is helpful to convert these numpy array or list into a dataframe and thus into csv or xlsx.

        a1=pandas.DataFrame(a) #based on [this](http://stackoverflow.com/questions/11415701/efficiently-construct-pandas-dataframe-from-large-list-of-tuples-rows)
        b1=pandas.DataFrame(b)
        c1=pandas.DataFrame(tmax1DA1)
        d1=pandas.concat([a1,b1,c1],axis=1) #based on [this](http://stackoverflow.com/questions/23891575/in-python-pandas-how-to-merge-two-frames-side-by-side)
d1.to_csv('data.csv')

Though the datframe can be directly converted into xlsx, due to some issue with pyxls it can’t be done, any way the csv is opened in xls and converted into xlsx formate.
This xlsx file was opened in pywrfemiss program and mentioned its variable as per specifed and it created two file wrfem_00to12z_d01 and wrfem_12to24z_d01. SOme minor problem due to not correct conversion of netcdf file into xlsx by python is faced but it was solved and reported in program issue.
Attempt to create PM2.5 and PM10 emission for cbe domain 2

from netCDF4 import Dataset import numpy import pandas import itertools import utm myexamplencfile = ‘/home/swl-sacon-dst/Documents/GISE2013/LAB/WRF-chem/Data/2010/MICSAsiaPM1020100.25x0.25.nc’ fh = Dataset(myexamplenc_file, mode=‘r’) print fh.variables.keys() lons = fh.variables[‘lon’][:] count0=np.arange(40.125,73.125,0.25) count1=np.arange(73.125,82.125,0.25) slo=len(count0) dlo=len(count1) elo=slo+dlo cbeDom2lon = fh.variables[‘lon’][slo:elo]

lats = fh.variables[‘lat’][:] count0=np.arange(-20.125,8.125,0.25) count1=np.arange(8.125,16.125,0.25) sla=len(count0) dla=len(count1) ela=sla+dla cbeDom2lat = fh.variables[‘lat’][sla:ela] itime=0 pm10r = fh.variables[‘PM10RESIDENTIAL’][itime,slo:elo,sla:ela] pm10ri = numpy.ravel(pm10r) pm10i = fh.variables[‘PM10_INDUSTRY’][itime,slo:elo,sla:ela] pm10ii = numpy.ravel(pm10i) pm10p = fh.variables[‘PM10POWER’][itime,slo:elo,sla:ela] pm10pi = numpy.ravel(pm10p) pm10t = fh.variables[‘PM10TRANSPORT’][itime,slo:elo,sla:ela] pm10ti = numpy.ravel(pm10t) a=list(itertools.product(cbeDom2lon,cbeDom2lat)) b=[] for i in a: c=utm.fromlatlon(i[1],i[0]) b.append© a1=pandas.DataFrame(a) b1=pandas.DataFrame(b) c1=pandas.DataFrame(pm10ri) c2=pandas.DataFrame(pm10ii) c3=pandas.DataFrame(pm10pi) c4=pandas.DataFrame(pm10ti) c5=c1+c2+c3+c4 d1=pandas.concat([a1,b1,c1],axis=1) d1.to_csv(‘pm25A2.csv’)

count0=np.arange(40.125,72.625,0.25) count1=np.arange(72.625,85.375,0.25) slo=len(count0) dlo=len(count1) elo=slo+dlo cbeDom2lon = fh.variables[‘lon’][slo:elo]

lats = fh.variables[‘lat’][:] count0=np.arange(-20.125,5.125,0.25) count1=np.arange(5.125,22.875,0.25) sla=len(count0) dla=len(count1) ela=sla+dla cbeDom2lat = fh.variables[‘lat’][sla:ela] a=list(itertools.product(cbeDom2lon,cbeDom2lat))

####FOR LOOP for taking 10 chemcial variable and create monthly xlsx data#### 1. MICS asia data gives emission invneoptry for 10 pollutant sepcies arranged in monthly wise values. To convert this into a single file for each month is required to have for loop in above script. For this the script was modified follwoing parts. 1. To open 10 netcdf files for each pollutant sepcies is carried out by collecting array list(ncf) of nc ending file in the folder by

path = ‘/home/swl-sacon-dst/Documents/GISE2013/LAB/WRF-chem/Data/2010/’ ncf = [f for f in os.listdir(path) if f.endswith(‘.nc’)] var=[] for i in ncf: var.append(re.findall(r’Asia(.*?)_2010’, i)) break

To open these netcdf files into python Netcdf4 library system and query its variable made a netsed for loop, in this indentation of each forloop is unnecessary and ends in error below.
```
      ^
```
IndentationError: unindent does not match any outer indentation level instead it is made as below to remove the error.

pct = [] for i in ncf: fh = Dataset(i, mode=‘r’) var = re.findall(r’Asia(.*?)2010’, i) print fh.variables.keys() for m in range(12): ###for loop for months pm10r = fh.variables[str(var[0])+’RESIDENTIAL’][m,slo:elo,sla:ela] pm10ri = numpy.ravel(pm10r) pm10i = fh.variables[str(var[0])+’INDUSTRY’][m,slo:elo,sla:ela] pm10ii = numpy.ravel(pm10i) pm10p = fh.variables[str(var[0])+’POWER’][m,slo:elo,sla:ela] pm10pi = numpy.ravel(pm10p) pm10t = fh.variables[str(var[0])+’TRANSPORT’][m,slo:elo,sla:ela] pm10ti = numpy.ravel(pm10t) c1=pandas.DataFrame(pm10ri) c2=pandas.DataFrame(pm10ii) c3=pandas.DataFrame(pm10pi) c4=pandas.DataFrame(pm10ti) c5=c1+c2+c3+c4 c5.columns = [str(var[0])+’_‘+str(m)] pct.append(c5) X = numpy.array(pct)

The above nested for loop, open a nc file in ncf array list, query with its variables and four emission sources(RESIDENTIAL, INDUSTRY, POWER and TRANSPORT), convert into numpy array, then convert into pandas datframe column, summed into a single column from four emission source cateogary column then dumped into list of numpy array of array. The array named here as pct appended with all the pollution specicies summed columns with each month values.
While doing this faced error of

Traceback (most recent call last): File “”, line 6, in TypeError: can only concatenate list (not “str”) to list

related with the for loop list has to be converted into strings to be looped to use with naming the bvariable and months in the netcdf file. 1. But doing with above method ends in creation of structured list with only last column values. 1. For this problem made a rewriting to work only with numpy array. In one instance it is learnt that pandas is a extension of numpy array so to use numpy only is better. 1. The code written in this aspect is to join multiple emission source categories into four column array and then sum the array into single column by doing

c3=np.column_stack((pm10ri,pm10ii,pm10pi,pm10ti)) c4=c3.sum(axis=1)

Majore search on join or concat in this reagrd is not turned fuitfull, only column stack made the trick. Summing array is based on this 1. This result in creation of numpy array with correct rows and columns but without any column headres. So to add the name of the array used below code based on this

c5=np.array(c4,dtype=[(str(var[0])+’_‘+str(m),’<f8’)])

But it can’t be retianined in for loop array append. Another major search is made on this, indecated that doing this way of numpy array loop into normal list appaned is best way but thghou it created structed list but woth all column namming it can’t retained in converted the for loop into numpy array again. It needs to be put up asa question in SO.

pct = [] for i in ncf: fh = Dataset(i, mode=‘r’) var = re.findall(r’Asia(.*?)2010’, i) print fh.variables.keys() for m in range(12): ###for loop for months pm10r = fh.variables[str(var[0])+’RESIDENTIAL’][m,slo:elo,sla:ela] pm10ri = np.ravel(pm10r) pm10i = fh.variables[str(var[0])+’INDUSTRY’][m,slo:elo,sla:ela] pm10ii = np.ravel(pm10i) pm10p = fh.variables[str(var[0])+’POWER’][m,slo:elo,sla:ela] pm10pi = np.ravel(pm10p) pm10t = fh.variables[str(var[0])+’TRANSPORT’][m,slo:elo,sla:ela] pm10ti = np.ravel(pm10t) c3=np.columnstack((pm10ri,pm10ii,pm10pi,pm10ti)) c4=c3.sum(axis=1) c5=np.array(c4,dtype=[(str(var[0])+’’+str(m),’<f8’)]) #c5=pandas.DataFrame(pm10ti) #c5.columns = [str(var[0])+’_‘+str(m)] pct.append(c5)

A possible solution of converting the pct as numpy.asarray also not working
This forced to make the numpy array without column head and has to make it in excel the column head of 120 rows and attached to the dataframe of pandas by

data=pandas.DataFrame(pct) data2=pandas.DataFrame.transpose(data) var=pandas.DataFrame.fromcsv(‘variables.csv’, header=0, indexcol=[0]) varL=var[‘variables’].tolist() data2.columns=varL aV=[‘long’,‘lat’] uV=[‘X’,‘Y’,‘Zone’,‘a’] a1.columns=aV b1.columns=uV data3=pandas.concat([a1,b1,data2],axis=1) data3.ix[:5, :10]

Simlarly made a attempt to make list of variable name in pure python using string mulitplicaiotn and join but ailed and turned to excel. following code is based on [this]

strs = [“CO_” for x in range(12)] e=np.array(strs) c=np.arange(0,12,1) c2=pandas.DataFrame© c3=pandas.DataFrame(e) c4 = c3[0].str.cat(c2[0].values.astype(str), sep=’ is ‘) c4 = c3.map(str) + “ is ” + c2

To get the name of array myData.dtype.names
to convert the dataframe column name into list

http://stackoverflow.com/questions/19482970/python-get-list-from-pandas-dataframe-column-headers 1. Has to convert from Mg/month into mol/km2/hour

First to convert Mg/month The MICS data is for 0.25 degree, so it is 28 km so the value gives as by giving command

print fh.variables['PM10_POWER'] float32 PM10POWER(time, lat, lon) longname: PM10POWER shortname: MICSAsiaPM10POW units: Mg/month gamapcategory: ANTHSRCE unlimited dimensions: current shape = (12, 441, 560) filling on, default _FillValue of 9.96920996839e+36 used

as Mg/mnoth, megagram per month can be taken as megagram/28km2/month, it is equal to meagram/km2/month. So the Mg/month has to be convert into mols/hour by convert first into g/hour and the into mols/hour,

to convert into g/hour, multiply it with (1000000/730.484) then to convert into mol/hour divided that with molar mass of the compound. For example to convert CO_0 into MD/month into mol/hour, use

a=(MICSm1.CO0*(1000000/731.484))/28.01

So for other variables, NH3, SO2, PM2.5, PM10 has to to be converted. For PM2.5 and PM10 it is not possible to convert into mol unit. For this it is decided to go upto gram per hour value

MICSm1[‘CO0c’]=(MICSm1.CO0*(1000000/731.484))/28.01 MICSm1[‘NH30c’]=(MICSm1.NH30(1000000/731.484))/17.031 MICSm1[‘SO20c’]=(MICSm1.SO2_0(1000000/731.484))/64.066 MICSm1[‘PM250c’]=MICSm1[‘PM2.50’]*(1000000/731.484) MICSm1[‘PM100c’]=MICSm1.PM100*(1000000/731.484) 1. While doing this it is found that the code early used for adding variable name into numpy array is wrong.

c5=np.array(c4,dtype=[(str(var[0])+’_‘+str(m),’<f8’)])

Also this code is a redundant where numpy array is declaring for an already a numpy array. Numpy array is a single data type array not a excel kind thing as noted here. This is also the problem for a situation where the ( ,) was forced to deleted in EXCEL sheet.

With this script below it was run and get the row of 3622 and saved lst file with four variables and ready to input into wrf chem.
With this data of 3622 rows, it took four hours from 13:14PM to 17:14PM to complete the simulation.

SO query

http://stackoverflow.com/questions/19646726/unsuccessful-append-to-an-empty-numpy-array

http://stackoverflow.com/questions/2641691/building-up-an-array-in-numpy-scipy-by-iteration-in-python

here for http://docs.scipy.org/doc/numpy/user/basics.rec.html basic of structure list

http://stackoverflow.com/questions/13693751/how-can-we-join-two-columns-of-a-numpy-array-into-a-single-column-separating-th