DOCX table into python

2014-11-01

###Docx table into pandas#### ####FOr this script /scripts/DOCXPANDASSEARCH.py#### 1. to import the table in docx format has to use window based python library pywin32. Instead of this the table can be all select, copy paste in Libre calculator and save it as csv. 1. This CSV file will have one blank line after every row as of DOCX table. To remove this row use this script, edited with considering the special delimiter used to silence the ‘,’ in the address column.

import CSV
...

input = open('CBEN2.csv', 'rb')
output = open('CBE_North_ind.csv', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    if any(field.strip(']') for field in row):
        writer.writerow(row)
input.close()
output.close()

To read by pandas dataframe,

db=pd.read_csv('CBE_North_ind.csv',delimiter=']')

above gives non equal row values for each column, the csv fileopened in text editor was serached for multiple ‘]]]’ and removed all of it. but still having unequal row value.

To know the presence of null values in the data frame used

da=pd.isnull(db[‘Name and Factory address’]) db[‘Nullys’]=pd.isnull(db[‘Products manufactured with Quantity and Intermediate / By-Products manufactured with Quantity’])

To sort the null, boolean values and view its location

rec=db.sort(['Nullys'],ascending=[0])

To search for a string in particular column

db2=db[db['columns'].str.contains("Search string")]