bolster.utils.datatables
Generic utility for extracting DataTables data from HTML pages.
Many Northern Ireland government statistics pages use R’s flexdashboard/DT package
to embed DataTables widgets. The data is stored as column-transposed JSON inside
<script type="application/json"> blocks with a {"x": {"data": [...], ...}}
structure, where x["data"] is a list of column arrays (not row arrays) and
x["container"] holds the HTML table header with column names.
Example
>>> from bolster.utils.datatables import datatables_to_dataframe
>>> payload = {
... "data": [["A", "B"], [1, 2]],
... "container": "<table><thead><tr><th>Name</th><th>Value</th></tr></thead></table>",
... }
>>> df = datatables_to_dataframe(payload)
>>> list(df.columns)
['Name', 'Value']
Attributes
Exceptions
Raised when DataTables extraction fails. |
Functions
|
Fetch an HTML page and extract the embedded DT widget JSON payload. |
|
Convert a DT widget payload into a row-oriented DataFrame. |
|
Fetch a DataTables page and return its column header names. |
Module Contents
- exception bolster.utils.datatables.DataTablesError[source]
Bases:
ExceptionRaised when DataTables extraction fails.
Initialize self. See help(type(self)) for accurate signature.
- bolster.utils.datatables.fetch_datatables_json(url, timeout=30)[source]
Fetch an HTML page and extract the embedded DT widget JSON payload.
The payload is the parsed content of the largest
<script type="application/json">block whosex.datakey is a column-transposed list (i.e. a list of lists).- Parameters:
- Returns:
The
xsub-dict from the DT widget payload, containing at minimum"data"(list of column arrays) and"container"(HTML header).- Raises:
DataTablesError – If the page cannot be fetched or no DT payload is found.
- Return type:
Example
>>> from bolster.utils.datatables import DataTablesError >>> try: ... fetch_datatables_json("https://example.com/data.html") ... except DataTablesError: ... print("DataTablesError raised for invalid page") DataTablesError raised for invalid page
A successful call returns a dict extracted from the page’s DT widget. The shape mirrors what
_extract_datatables_payloadreturns:>>> sample_html = ( ... '<script type="application/json">' ... '{"x": {"data": [["Belfast", "Derry"], [1200, 800]],' ... ' "container": "<thead><tr><th>City</th><th>Count</th></tr></thead>"}}' ... "</script>" ... ) >>> payload = _extract_datatables_payload(sample_html) >>> sorted(payload.keys()) ['container', 'data'] >>> len(payload["data"]) 2 >>> payload["data"][0] ['Belfast', 'Derry']
- bolster.utils.datatables.datatables_to_dataframe(payload)[source]
Convert a DT widget payload into a row-oriented DataFrame.
The
payload["data"]field is a list of column arrays (column-transposed). This function transposes it into a normal row-oriented DataFrame and uses column names frompayload["container"]if available.- Parameters:
payload (dict) – The
xsub-dict from a DT widget JSON block, as returned byfetch_datatables_json().- Returns:
DataFrame with one row per record and columns named from the HTML header.
- Raises:
DataTablesError – If
payload["data"]is missing or malformed.- Return type:
Example
>>> payload = { ... "data": [["a", "b"], [1, 2]], ... "container": "<table><thead><tr><th>Name</th><th>Value</th></tr></thead></table>", ... } >>> df = datatables_to_dataframe(payload) >>> list(df.columns) ['Name', 'Value'] >>> len(df) 2