bolster

Bolster - A personal collection of Python utilities and data sources.

A grab bag of handy functions for working with Northern Ireland data, basic stats operations, and general data science tasks. Built for personal projects and exploration.

What’s in here:
  • data_sources: NI water quality, house prices, cinema listings, etc.

  • stats: Basic data frame operations and distribution fitting

  • utils: Web scraping helpers, decorators, AWS/Azure bits

  • cli: Command line tools for the data sources

Quick examples:

>>> from bolster.data_sources import ni_water
>>> quality_data = ni_water.get_water_quality()
>>> 'NI Hardness Classification' in quality_data.columns
True
>>> from bolster.stats import add_totals
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2], [3, 4]])
>>> add_totals(df, inplace=False)
       0  1  total
0      1  2      3
1      3  4      7
total  4  6     10

Author: Andrew Bolster

Submodules

Attributes

__author__

__email__

__version__

logger

Exceptions

DataNotFoundError

Raised when expected data publications or URLs are not found.

DataSourceError

Base class for all data source errors.

NetworkError

Raised when network operations fail beyond retry limits.

ParseError

Raised when file or data parsing fails.

ValidationError

Raised when data fails integrity validation checks.

MultipleErrors

Exception Class to enable the capturing of multiple exceptions without interrupting control flow.

Classes

memoize

cache the return value of a method.

Functions

always(x, **kwargs)

Pointless passthrough replacement for 'always true' filtering.

poolmap(f, iterable[, max_workers, progress])

Helper function to encapsulate a ThreadPoolExecutor mapped function workflow.

batch(seq[, n])

Split a sequence into n-length batches (is still iterable, not list).

chunks(iterable[, size])

Outputs <list> chunks of size N from an iterable (generator).

arg_exception_logger(func)

Helper Decorator to provide info on the arguments that cause the exception of a wrapped function.

backoff([exception_to_check, tries, delay, backoff, ...])

Retry calling the decorated function using an exponential backoff.

tag_gen(seq, **kwargs)

Generator stream that adds kwargs to each entry yielded.

exceptional_executor(futures[, exception_handler, timeout])

Generator for concurrent.Futures handling.

working_directory(path)

Contextmanager that changes working directory and returns to previous on exit.

compress_for_relay(obj)

Compress json-serializable object to a gzipped base64 string.

decompress_from_relay(msg)

Uncompress gzipped base64 string to a json-serializable object.

pretty_print_request(req[, expose_auth, ...])

At this point it is completely built and ready to be fired; it is "prepared".

get_recursively(search_dict, field)

Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided.

transform_(r, rule_keys)

Generic Item-wise transformation function.

diff(new, old[, excluded_fields])

Perform a one-depth diff of a pair of dictionaries.

aggregate(base, group_key, item_key[, condition])

Abstracted groupby-sum for lists of dicts.

breadth(d)

Get the total 'width' of a tree.

depth(d)

Get the maximum depth of a tree.

set_keys(d)

Extract the set of all keys of a nested dict/tree.

keys_at(d, n[, i])

Extract the keys of a tree at a given depth.

items_at(d, n[, i])

Extract the elements from a tree at a given depth.

leaves(d)

Iterate on the leaves of a tree.

leaf_paths(d[, path])

Get all leaf paths in a nested dictionary structure.

flatten_dict(d[, head, sep])

Flatten a nested dictionary using separator for key names.

uncollect_object(d)

Convert flat dictionary back to nested structure using path tuples.

dict_concat_safe(d, keys[, default])

Really Lazy Func because dict.get('key',default) is a pain in the ass for lists.

build_default_mapping_dict_from_keys(keys)

Constructs a mapping dictionary between (presumably) snakecase keys to 'human-readable' title case.

Package Contents

bolster.__author__ = 'Andrew Bolster'[source]
bolster.__email__ = 'andrew.bolster@gmail.com'[source]
bolster.__version__[source]
exception bolster.DataNotFoundError(message, url=None, source=None)[source]

Bases: DataSourceError

Raised when expected data publications or URLs are not found.

Examples

  • Publication page returns 404

  • Expected Excel file link missing from page

  • RSS feed returns no entries

  • API endpoint returns empty response

Parameters:
  • message (str) – Description of what data was not found

  • url (str) – Optional URL that was being accessed

  • source (str) – Optional data source identifier

Initialize DataSourceError with message and optional context.

url = None
source = None
__str__()[source]

Return str(self).

exception bolster.DataSourceError[source]

Bases: Exception

Base class for all data source errors.

This is the root exception for all domain-specific errors in Bolster. All other exceptions should inherit from this base class.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.NetworkError(message, url=None, status_code=None, retry_count=None)[source]

Bases: DataSourceError

Raised when network operations fail beyond retry limits.

Examples

  • Timeout errors after retries

  • Connection refused

  • DNS resolution failures

  • Server returning persistent errors (500, 503)

Parameters:
  • message (str) – Description of network failure

  • url (str) – Optional URL that failed

  • status_code (int) – Optional HTTP status code

  • retry_count (int) – Optional number of retries attempted

Initialize NetworkError with message and optional network context.

url = None
status_code = None
retry_count = None
__str__()[source]

Return str(self).

exception bolster.ParseError(message, file_path=None, parser_type=None)[source]

Bases: DataSourceError

Raised when file or data parsing fails.

Examples

  • Malformed Excel file structure

  • Unexpected CSV format

  • HTML parsing issues

  • JSON decode errors

Parameters:
  • message (str) – Description of parsing failure

  • file_path (str) – Optional path to file that failed to parse

  • parser_type (str) – Optional type of parser (excel, csv, html, json)

Initialize ParseError with message and optional parsing context.

file_path = None
parser_type = None
__str__()[source]

Return str(self).

exception bolster.ValidationError(message, data_info=None, validation_type=None)[source]

Bases: DataSourceError

Raised when data fails integrity validation checks.

Examples

  • Required columns missing from DataFrame

  • Data values outside expected ranges

  • Inconsistent data relationships

  • Empty datasets when data expected

Parameters:
  • message (str) – Description of validation failure

  • data_info (str) – Optional info about the problematic data

  • validation_type (str) – Optional type of validation that failed

Initialize ValidationError with message and optional validation context.

data_info = None
validation_type = None
__str__()[source]

Return str(self).

bolster.logger[source]
bolster.always(x, **kwargs)[source]

Pointless passthrough replacement for ‘always true’ filtering.

>>> always('false')
True
>>> always(False)
True
>>> always(True)
True
bolster.poolmap(f, iterable, max_workers=None, progress=None, **kwargs)[source]

Helper function to encapsulate a ThreadPoolExecutor mapped function workflow.

Accepts (assumed to be tqdm style) progress monitor callback.

kwargs are passed identically to all f(i) calls for each i in iterable

Parameters:
Returns:

Dictionary mapping from input items to their results

Return type:

dict

bolster.batch(seq, n=1)[source]

Split a sequence into n-length batches (is still iterable, not list).

Parameters:
Returns:

Generator yielding batches of the sequence

Return type:

collections.abc.Generator[collections.abc.Iterable, None, None]

Examples

>>> next((b for b in batch(range(10), 2)))
range(0, 2)
>>> [b for b in batch(list(range(10)), 2)]
[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
bolster.chunks(iterable, size=10)[source]

Outputs <list> chunks of size N from an iterable (generator).

Parameters:

Returns: >>> next((b for b in chunks(range(10), 2))) [0, 1] >>> [b for b in chunks(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]

bolster.arg_exception_logger(func)[source]

Helper Decorator to provide info on the arguments that cause the exception of a wrapped function.

Parameters:

func (collections.abc.Callable) – Function to wrap with exception logging

Returns:

Wrapped function with exception argument logging

Return type:

Callable

bolster.backoff(exception_to_check=BaseException, tries=5, delay=0.2, backoff=2, logger=logger)[source]

Retry calling the decorated function using an exponential backoff.

http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/ original from: http://wiki.python.org/moin/PythonDecoratorLibrary#Retry

Can’t Type-Annotate Exceptions because [it’s verboten](https://peps.python.org/pep-0484/#exceptions)

Parameters:

exception_to_check (Any | collections.abc.Sequence[Any]) – the exception to check. may be a tuple of

exceptions to check

tries: number of times to try (not retry) before giving up (Default value = 5) delay: initial delay between retries in seconds (Default value = 0.4) backoff: backoff multiplier e.g. value of 2 will double the delay

each retry (Default value = 2)

logger: logger to use. If None, print (Default value = local utils logger)

exception bolster.MultipleErrors(errors=None)[source]

Bases: BaseException

Exception Class to enable the capturing of multiple exceptions without interrupting control flow.

I.e. catch the exception, but carry on and report the exceptions at the end.

E.g.

exceptions = MultipleErrors()
try:
    do_risky_thing_with(this) #raises ValueError
except:
    exceptions.capture_current_exception()
try:
    do_other_thing_with(this) #raises AttributeError
except:
    exceptions.capture_current_exception()
exceptions.do_raise()
 Traceback (most recent call last):
    ....
Value Error

Traceback (most recent call last):
    ...
AttributeError

Initialize MultipleErrors with optional list of existing errors.

errors = [][source]
__str__()[source]

Return formatted string representation of all captured exceptions.

capture_current_exception()[source]

Gathers exception info from the current context and retains it.

do_raise()[source]

Raises itself if it contains any errors.

bolster.tag_gen(seq, **kwargs)[source]

Generator stream that adds kwargs to each entry yielded.

Parameters:
  • seq (collections.abc.Iterator[dict]) – Iterator of dictionaries to tag

  • **kwargs – Additional key-value pairs to add to each dictionary

Examples

The below example shows the creation of an empty dict generator where tag_gen is used to insert a new key/value (k=1) in each item on the fly

>>> all([i['k'] == 1 for i in tag_gen(({} for _ in range(4)), k=1)])
True
bolster.exceptional_executor(futures, exception_handler=None, timeout=None)[source]

Generator for concurrent.Futures handling.

When an exception is raised in an executing Future, f.result() called on it’s own will raise that exception in the parent thread, killing execution and causing loss of ‘future local’ scope.

Instead, query the future for it’s exception state first, and handle that separately, by default by logging it as an exception.

Parameters:
bolster.working_directory(path)[source]

Contextmanager that changes working directory and returns to previous on exit.

Parameters:

path (str | pathlib.Path) – Union[str: Path]:

bolster.compress_for_relay(obj)[source]

Compress json-serializable object to a gzipped base64 string.

Parameters:
  • obj (list | dict) – return:

  • obj – Union[List,Dict]:

>>> decompress_from_relay(compress_for_relay(['test']))
['test']
>>> decompress_from_relay(compress_for_relay({'test':'test'}))
{'test': 'test'}
bolster.decompress_from_relay(msg)[source]

Uncompress gzipped base64 string to a json-serializable object.

[‘test’].

Parameters:

msg (AnyStr) – AnyStr:

class bolster.memoize(func)[source]

cache the return value of a method.

This class is meant to be used as a decorator of methods. The return value from a given method invocation will be cached on the instance whose method was invoked. All arguments passed to a method decorated with memoize must be hashable.

If a memoized method is invoked directly on its class the result will not be cached. Instead the method will be invoked like a static method:

class Obj(object):
    @memoize
    def add_to(self, arg):
    return self + arg

Obj.add_to(1) # not enough arguments
Obj.add_to(1, 2) # returns 3, result is not cached

Source: http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/

Augmented with cache hit/miss population Counters

Initialize the LRU cache decorator with a function.

func[source]
__get__(obj, objtype=None)[source]
__call__(*args, **kw)[source]

Execute the cached function with LRU behavior and hit/miss tracking.

bolster.pretty_print_request(req, expose_auth=False, authentication_header_blacklist=None)[source]

At this point it is completely built and ready to be fired; it is “prepared”.

However pay attention at the formatting used in this function because it is programmed to be pretty printed and may differ from the actual request.

Parameters:
  • req – HTTP request object to pretty print

  • expose_auth – Whether to expose authentication headers (Default value = False)

  • authentication_header_blacklist (collections.abc.Sequence | None) – List of header names to redact when expose_auth is False

bolster.get_recursively(search_dict, field)[source]

Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided.

Originally taken from https://stackoverflow.com/a/20254842

Parameters:
  • search_dict (dict) – Dict:

  • field (str) – str:

Returns: >>> get_recursively({‘id’ : 5,’children’ : {‘id’ : 6,’children’ : {‘id’ : 7,’children’ : {}}}}, ‘id’) [5, 6, 7]

bolster.transform_(r, rule_keys)[source]

Generic Item-wise transformation function.

The values in r are updated based on key-matching in rule_keys, i.e. -> out[k] = rule_keys[k] (r[k]).

HOWEVER, this can do more that straight callable mapping; can also update the key, i.e., for a given rule such that R = rule_keys[k]:

R can be used to select that field to be selected in the output >>> r = {‘a’:’1’,’b’:’2’,’c’:’3’} >>> transform_(r, {‘a’:None}) {‘a’: ‘1’}

Rename a key >>> transform_(r, {‘a’:(‘A’,None)}) {‘A’: ‘1’}

Apply a function to a key’s value >>> transform_(r, {‘a’:(‘a’,int)}) {‘a’: 1}

Or a combination of these >>> transform_(r, {‘a’:(‘A’,int), ‘b’:None}) {‘A’: 1, ‘b’: ‘2’}

bolster.diff(new, old, excluded_fields=None)[source]

Perform a one-depth diff of a pair of dictionaries.

#TODO diff needs tests

bolster.aggregate(base, group_key, item_key, condition=None)[source]

Abstracted groupby-sum for lists of dicts.

Operationally equivalent to: ` df = pd.DataFrame(base) df.where(condition).groupby(group_key)[item_key].sum() `

# TODO aggregate needs tests

Parameters:
  • base (list[dict]) – List of dictionaries to group and sum

  • group_key (AnyStr | tuple[AnyStr] | list[AnyStr]) – Key(s) to group by - can be string, tuple, or list of strings

  • item_key (AnyStr) – Key to sum values for within each group

  • condition (collections.abc.Callable | None) – Optional function to filter records before grouping

bolster.breadth(d)[source]

Get the total ‘width’ of a tree.

> Why was this a thing? No idea

bolster.depth(d)[source]

Get the maximum depth of a tree.

bolster.set_keys(d)[source]

Extract the set of all keys of a nested dict/tree.

bolster.keys_at(d, n, i=0)[source]

Extract the keys of a tree at a given depth.

bolster.items_at(d, n, i=0)[source]

Extract the elements from a tree at a given depth.

bolster.leaves(d)[source]

Iterate on the leaves of a tree.

bolster.leaf_paths(d, path=None)[source]

Get all leaf paths in a nested dictionary structure.

bolster.flatten_dict(d, head='', sep=':')[source]

Flatten a nested dictionary using separator for key names.

bolster.uncollect_object(d)[source]

Convert flat dictionary back to nested structure using path tuples.

bolster.dict_concat_safe(d, keys, default=None)[source]

Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists.

bolster.build_default_mapping_dict_from_keys(keys)[source]

Constructs a mapping dictionary between (presumably) snakecase keys to ‘human-readable’ title case.

Intended for easy construction of presentable graphs/tables etc.

>>> build_default_mapping_dict_from_keys(['a_b','b_c','c_d'])
{'a_b': 'A B', 'b_c': 'B C', 'c_d': 'C D'}