A
A
a73xsh2019-05-03 10:37:55
Python
a73xsh, 2019-05-03 10:37:55

How to parse a table with multiple headers?

There is an HTML table

<table border="1" width="100%" bordercolor="#AAAAAA" bgcolor="#F2F2F2" cellspacing="0" cellpadding="1" style="border-collapse:collapse"><tbody><tr></tr></tbody></table><table width="90%" align="center" border="1" bordercolor="#AAAAAA" bgcolor="#F2F2F2" cellspacing="0" cellpadding="1" style="border-collapse:collapse">
<tbody>
<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 0</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.21</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr0" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.01</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr1" value="1"></td>
</tr>

<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 1</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>1.89</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr16" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>1.00</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr17" value="1"></td>
</tr>

<td align="center" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>
Pod 2</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Response Score</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Retries</b></font></td>
<td align="center" width="20%" bgcolor="#565A5C"><font face="Arial, Helvetica, sans-serif" color="#ffffff" size="2"><b>Clear Retries</b></font></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 1</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.08</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr32" value="1"></td>
</tr>
<tr>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>Disk 2</b></font></td>
<td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>2.15</b></font></td><td align="center"><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><b>0</b></font></td><td align="center"><input type="checkbox" name="clr33" value="1"></td>
</tr>
</td>
</tr>
</tbody></table><br>

I use BeautifulSoup and export to JSON I get
[{'Disk 1': 2.21}, {'Disk 2': 2.01}, {'Disk 3': 2.08}, {'Disk 4': 2.15}, {'Disk 5': 2.27}, {'Disk 6': 2.07}, {'Disk 7': 1.98} ...

How do I add a header name to Disk x so that there is an output
[{'Pod1-Disk 1': 2.21}, {'Pod1-Disk 2': 2.01},  ...{'Pod2-Disk 1': 2.21}, {'Pod2-Disk 2': 2.01},

My code
def parse(html):
    table_data = []
    soup = BeautifulSoup(html, 'lxml')
    table = soup.find(
        'table', attrs={"border": "1", "width": "90%"})
    rows = table.find_all('tr')[2:]

    for row in rows:
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        table_data.append([ele for ele in cols if ele])
    final_data = []

    for data in table_data:
        d = dict([(k, v) for k, v in zip(data[::2], data[1::2])])
        final_data.append(d)
    formated_json = remove_quotes(json.dumps(final_data))

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
VeryLongAgoDid, 2019-05-03
@VeryLongAgoDid

So what's stopping you from checking if 'Pod' is contained or not? If it is, then write all the text from this cell into the variable insert_at_beginning . Keep going and if you hit 'Pod' again, your variable will be replaced.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question