bolster
Bolster - A personal collection of Python utilities and data sources.
A grab bag of handy functions for working with Northern Ireland data, basic stats operations, and general data science tasks. Built for personal projects and exploration.
- What’s in here:
data_sources: NI water quality, house prices, cinema listings, etc.
stats: Basic data frame operations and distribution fitting
utils: Web scraping helpers, decorators, AWS/Azure bits
cli: Command line tools for the data sources
Quick examples:
>>> from bolster.data_sources import ni_water
>>> quality_data = ni_water.get_water_quality()
>>> 'NI Hardness Classification' in quality_data.columns
True
>>> from bolster.stats import add_totals
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2], [3, 4]])
>>> add_totals(df, inplace=False)
0 1 total
0 1 2 3
1 3 4 7
total 4 6 10
Author: Andrew Bolster
Submodules
Attributes
Exceptions
Raised when expected data publications or URLs are not found. |
|
Base class for all data source errors. |
|
Raised when network operations fail beyond retry limits. |
|
Raised when file or data parsing fails. |
|
Raised when data fails integrity validation checks. |
|
Exception Class to enable the capturing of multiple exceptions without interrupting control flow. |
Classes
cache the return value of a method. |
Functions
|
Pointless passthrough replacement for 'always true' filtering. |
|
Helper function to encapsulate a ThreadPoolExecutor mapped function workflow. |
|
Split a sequence into n-length batches (is still iterable, not list). |
|
Outputs <list> chunks of size N from an iterable (generator). |
|
Helper Decorator to provide info on the arguments that cause the exception of a wrapped function. |
|
Retry calling the decorated function using an exponential backoff. |
|
Generator stream that adds kwargs to each entry yielded. |
|
Generator for concurrent.Futures handling. |
|
Contextmanager that changes working directory and returns to previous on exit. |
|
Compress json-serializable object to a gzipped base64 string. |
Uncompress gzipped base64 string to a json-serializable object. |
|
|
At this point it is completely built and ready to be fired; it is "prepared". |
|
Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided. |
|
Generic Item-wise transformation function. |
|
Perform a one-depth diff of a pair of dictionaries. |
|
Abstracted groupby-sum for lists of dicts. |
|
Get the total 'width' of a tree. |
|
Get the maximum depth of a tree. |
|
Extract the set of all keys of a nested dict/tree. |
|
Extract the keys of a tree at a given depth. |
|
Extract the elements from a tree at a given depth. |
|
Iterate on the leaves of a tree. |
|
Get all leaf paths in a nested dictionary structure. |
|
Flatten a nested dictionary using separator for key names. |
Convert flat dictionary back to nested structure using path tuples. |
|
|
Really Lazy Func because dict.get('key',default) is a pain in the ass for lists. |
Constructs a mapping dictionary between (presumably) snakecase keys to 'human-readable' title case. |
Package Contents
- exception bolster.DataNotFoundError(message, url=None, source=None)[source]
Bases:
DataSourceErrorRaised when expected data publications or URLs are not found.
Examples
Publication page returns 404
Expected Excel file link missing from page
RSS feed returns no entries
API endpoint returns empty response
- Parameters:
Initialize DataSourceError with message and optional context.
- url = None
- source = None
- exception bolster.DataSourceError[source]
Bases:
ExceptionBase class for all data source errors.
This is the root exception for all domain-specific errors in Bolster. All other exceptions should inherit from this base class.
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.NetworkError(message, url=None, status_code=None, retry_count=None)[source]
Bases:
DataSourceErrorRaised when network operations fail beyond retry limits.
Examples
Timeout errors after retries
Connection refused
DNS resolution failures
Server returning persistent errors (500, 503)
- Parameters:
Initialize NetworkError with message and optional network context.
- url = None
- status_code = None
- retry_count = None
- exception bolster.ParseError(message, file_path=None, parser_type=None)[source]
Bases:
DataSourceErrorRaised when file or data parsing fails.
Examples
Malformed Excel file structure
Unexpected CSV format
HTML parsing issues
JSON decode errors
- Parameters:
Initialize ParseError with message and optional parsing context.
- file_path = None
- parser_type = None
- exception bolster.ValidationError(message, data_info=None, validation_type=None)[source]
Bases:
DataSourceErrorRaised when data fails integrity validation checks.
Examples
Required columns missing from DataFrame
Data values outside expected ranges
Inconsistent data relationships
Empty datasets when data expected
- Parameters:
Initialize ValidationError with message and optional validation context.
- data_info = None
- validation_type = None
- bolster.always(x, **kwargs)[source]
Pointless passthrough replacement for ‘always true’ filtering.
>>> always('false') True >>> always(False) True >>> always(True) True
- bolster.poolmap(f, iterable, max_workers=None, progress=None, **kwargs)[source]
Helper function to encapsulate a ThreadPoolExecutor mapped function workflow.
Accepts (assumed to be tqdm style) progress monitor callback.
kwargs are passed identically to all f(i) calls for each i in iterable
- Parameters:
f (collections.abc.Callable) – function to map across
iterable (collections.abc.Iterable) – Sequence of items to process
max_workers (int | None) – Maximum number of worker threads (Default value = None)
progress (collections.abc.Callable) – Progress callback function (Default value = None)
**kwargs – passed as arguments to f
- Returns:
Dictionary mapping from input items to their results
- Return type:
- bolster.batch(seq, n=1)[source]
Split a sequence into n-length batches (is still iterable, not list).
- Parameters:
seq (collections.abc.Sequence) – Sequence to split into batches
n (int) – Size of each batch (default: 1)
- Returns:
Generator yielding batches of the sequence
- Return type:
collections.abc.Generator[collections.abc.Iterable, None, None]
Examples
>>> next((b for b in batch(range(10), 2))) range(0, 2)
>>> [b for b in batch(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
- bolster.chunks(iterable, size=10)[source]
Outputs <list> chunks of size N from an iterable (generator).
- Parameters:
iterable (collections.abc.Iterable) – param size:
iterable – Iterable:
size – (Default value = 10)
Returns: >>> next((b for b in chunks(range(10), 2))) [0, 1] >>> [b for b in chunks(list(range(10)), 2)] [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]
- bolster.arg_exception_logger(func)[source]
Helper Decorator to provide info on the arguments that cause the exception of a wrapped function.
- Parameters:
func (collections.abc.Callable) – Function to wrap with exception logging
- Returns:
Wrapped function with exception argument logging
- Return type:
Callable
- bolster.backoff(exception_to_check=BaseException, tries=5, delay=0.2, backoff=2, logger=logger)[source]
Retry calling the decorated function using an exponential backoff.
http://www.saltycrane.com/blog/2009/11/trying-out-retry-decorator-python/ original from: http://wiki.python.org/moin/PythonDecoratorLibrary#Retry
Can’t Type-Annotate Exceptions because [it’s verboten](https://peps.python.org/pep-0484/#exceptions)
- Parameters:
exception_to_check (Any | collections.abc.Sequence[Any]) – the exception to check. may be a tuple of
- exceptions to check
tries: number of times to try (not retry) before giving up (Default value = 5) delay: initial delay between retries in seconds (Default value = 0.4) backoff: backoff multiplier e.g. value of 2 will double the delay
- each retry (Default value = 2)
logger: logger to use. If None, print (Default value = local utils logger)
- exception bolster.MultipleErrors(errors=None)[source]
Bases:
BaseExceptionException Class to enable the capturing of multiple exceptions without interrupting control flow.
I.e. catch the exception, but carry on and report the exceptions at the end.
E.g.
exceptions = MultipleErrors() try: do_risky_thing_with(this) #raises ValueError except: exceptions.capture_current_exception() try: do_other_thing_with(this) #raises AttributeError except: exceptions.capture_current_exception() exceptions.do_raise()
Traceback (most recent call last): .... Value Error Traceback (most recent call last): ... AttributeErrorInitialize MultipleErrors with optional list of existing errors.
- bolster.tag_gen(seq, **kwargs)[source]
Generator stream that adds kwargs to each entry yielded.
- Parameters:
seq (collections.abc.Iterator[dict]) – Iterator of dictionaries to tag
**kwargs – Additional key-value pairs to add to each dictionary
Examples
The below example shows the creation of an empty dict generator where tag_gen is used to insert a new key/value (k=1) in each item on the fly
>>> all([i['k'] == 1 for i in tag_gen(({} for _ in range(4)), k=1)]) True
- bolster.exceptional_executor(futures, exception_handler=None, timeout=None)[source]
Generator for concurrent.Futures handling.
When an exception is raised in an executing Future, f.result() called on it’s own will raise that exception in the parent thread, killing execution and causing loss of ‘future local’ scope.
Instead, query the future for it’s exception state first, and handle that separately, by default by logging it as an exception.
- Parameters:
futures (collections.abc.Sequence[concurrent.futures.Future]) – Sequence of Future objects to process
exception_handler – Optional callable to handle exceptions from futures
timeout – Optional timeout for waiting for futures to complete
- bolster.working_directory(path)[source]
Contextmanager that changes working directory and returns to previous on exit.
- Parameters:
path (str | pathlib.Path) – Union[str: Path]:
- bolster.compress_for_relay(obj)[source]
Compress json-serializable object to a gzipped base64 string.
>>> decompress_from_relay(compress_for_relay(['test'])) ['test']
>>> decompress_from_relay(compress_for_relay({'test':'test'})) {'test': 'test'}
- bolster.decompress_from_relay(msg)[source]
Uncompress gzipped base64 string to a json-serializable object.
[‘test’].
- Parameters:
msg (AnyStr) – AnyStr:
- class bolster.memoize(func)[source]
cache the return value of a method.
This class is meant to be used as a decorator of methods. The return value from a given method invocation will be cached on the instance whose method was invoked. All arguments passed to a method decorated with memoize must be hashable.
If a memoized method is invoked directly on its class the result will not be cached. Instead the method will be invoked like a static method:
class Obj(object): @memoize def add_to(self, arg): return self + arg Obj.add_to(1) # not enough arguments Obj.add_to(1, 2) # returns 3, result is not cached
Source: http://code.activestate.com/recipes/577452-a-memoize-decorator-for-instance-methods/
Augmented with cache hit/miss population Counters
Initialize the LRU cache decorator with a function.
- bolster.pretty_print_request(req, expose_auth=False, authentication_header_blacklist=None)[source]
At this point it is completely built and ready to be fired; it is “prepared”.
However pay attention at the formatting used in this function because it is programmed to be pretty printed and may differ from the actual request.
- Parameters:
req – HTTP request object to pretty print
expose_auth – Whether to expose authentication headers (Default value = False)
authentication_header_blacklist (collections.abc.Sequence | None) – List of header names to redact when expose_auth is False
- bolster.get_recursively(search_dict, field)[source]
Takes a dict with nested lists and dicts, and searches all dicts for a key of the field provided.
Originally taken from https://stackoverflow.com/a/20254842
Returns: >>> get_recursively({‘id’ : 5,’children’ : {‘id’ : 6,’children’ : {‘id’ : 7,’children’ : {}}}}, ‘id’) [5, 6, 7]
- bolster.transform_(r, rule_keys)[source]
Generic Item-wise transformation function.
The values in r are updated based on key-matching in rule_keys, i.e. -> out[k] = rule_keys[k] (r[k]).
HOWEVER, this can do more that straight callable mapping; can also update the key, i.e., for a given rule such that R = rule_keys[k]:
R can be used to select that field to be selected in the output >>> r = {‘a’:’1’,’b’:’2’,’c’:’3’} >>> transform_(r, {‘a’:None}) {‘a’: ‘1’}
Rename a key >>> transform_(r, {‘a’:(‘A’,None)}) {‘A’: ‘1’}
Apply a function to a key’s value >>> transform_(r, {‘a’:(‘a’,int)}) {‘a’: 1}
Or a combination of these >>> transform_(r, {‘a’:(‘A’,int), ‘b’:None}) {‘A’: 1, ‘b’: ‘2’}
- bolster.diff(new, old, excluded_fields=None)[source]
Perform a one-depth diff of a pair of dictionaries.
#TODO diff needs tests
- bolster.aggregate(base, group_key, item_key, condition=None)[source]
Abstracted groupby-sum for lists of dicts.
Operationally equivalent to:
` df = pd.DataFrame(base) df.where(condition).groupby(group_key)[item_key].sum() `# TODO aggregate needs tests
- Parameters:
group_key (AnyStr | tuple[AnyStr] | list[AnyStr]) – Key(s) to group by - can be string, tuple, or list of strings
item_key (AnyStr) – Key to sum values for within each group
condition (collections.abc.Callable | None) – Optional function to filter records before grouping
- bolster.flatten_dict(d, head='', sep=':')[source]
Flatten a nested dictionary using separator for key names.
- bolster.uncollect_object(d)[source]
Convert flat dictionary back to nested structure using path tuples.
- bolster.dict_concat_safe(d, keys, default=None)[source]
Really Lazy Func because dict.get(‘key’,default) is a pain in the ass for lists.
- bolster.build_default_mapping_dict_from_keys(keys)[source]
Constructs a mapping dictionary between (presumably) snakecase keys to ‘human-readable’ title case.
Intended for easy construction of presentable graphs/tables etc.
>>> build_default_mapping_dict_from_keys(['a_b','b_c','c_d']) {'a_b': 'A B', 'b_c': 'B C', 'c_d': 'C D'}