Bolster

Bolster

PyPI Python License GitHub Actions Code Coverage Documentation Ruff uv Pre-commit

Bolster’s Brain, you’ve been warned 🧠

A comprehensive Python utility library for data science, web scraping, cloud services, and general development workflows. Originally designed as a personal toolkit, Bolster has evolved into a robust collection of utilities that enhance productivity across data analysis, system administration, and software development tasks.

πŸš€ Quick Start

Installation

pip install bolster

Basic Usage

import bolster

# Efficient data processing with built-in progress tracking
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
results = bolster.poolmap(lambda x: x**2, data)
print(results)  # {1: 1, 2: 4, 3: 9, 4: 16, ...}


# Smart retry logic with exponential backoff
@bolster.backoff(Exception, tries=3, delay=1, backoff=2)
def unreliable_api_call():
    # Your potentially failing code here
    return "Success!"


# Efficient tree/dict navigation
nested_data = {
    "users": {
        "active": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}],
        "inactive": [{"name": "Charlie", "age": 35}],
    }
}

# Find all ages recursively
ages = bolster.get_recursively(nested_data, "age")
print(ages)  # [25, 30, 35]

# Flatten nested structures
flat = bolster.flatten_dict(nested_data)
print(flat["users:active:0:name"])  # 'Alice'

🎯 Core Features

Concurrency & Performance

  • poolmap(): ThreadPoolExecutor wrapper with progress monitoring and robust error handling

  • exceptional_executor(): Graceful handling of failed futures in concurrent operations

  • backoff(): Exponential backoff retry decorator for unreliable operations

  • memoize(): Instance method caching with hit/miss tracking for performance optimization

Data Processing & Transformation

  • aggregate(): Pandas-like groupby operations for dictionaries and lists

  • transform_(): Flexible data transformation with key mapping and function application

  • batch() / chunks(): Efficient sequence partitioning for processing large datasets

  • Compression utilities: compress_for_relay() / decompress_from_relay() for data serialization

Tree & Dictionary Navigation

  • get_recursively(): Extract values from deeply nested structures by key

  • flatten_dict(): Convert nested dictionaries to flat key-value pairs

  • Tree analysis: breadth(), depth(), leaves(), leaf_paths() for structure inspection

  • Path navigation: keys_at(), items_at() for level-specific data access

Development & Debugging

  • arg_exception_logger(): Decorator for debugging function calls with automatic argument logging

  • MultipleErrors: Accumulate and handle multiple exceptions in complex workflows

  • working_directory(): Context manager for safe directory operations

  • pretty_print_request(): HTTP request debugging with automatic auth redaction

πŸ“Š Data Sources

Bolster includes specialized modules for working with Northern Ireland and UK data sources:

Northern Ireland Water Quality

from bolster.data_sources.ni_water import get_water_quality, get_water_quality_by_zone

# Get comprehensive water quality data for all NI supply zones
df = get_water_quality()
print(df.shape)  # Shows number of zones and parameters

# Get specific zone data
zone_data = get_water_quality_by_zone("BALM")  # Belfast Malone area
print(f"Hardness: {zone_data['NI Hardness Classification']}")

Electoral Office for Northern Ireland (EONI)

from bolster.data_sources.eoni import get_election_results

# Get Assembly election results
results_2016 = get_election_results(2016)
results_2022 = get_election_results(2022)

# Compare party performance across elections
comparison = bolster.diff(results_2022, results_2016)

Companies House Data

from bolster.data_sources.companies_house import search_companies, get_company_details

# Search for companies
results = search_companies("Technology")

# Get detailed company information
company = get_company_details("12345678")  # Company number
print(f"{company['name']} - Status: {company['status']}")

UK Met Office

from bolster.data_sources.metoffice import get_precipitation_data

# Get weather data for a specific location
weather = get_precipitation_data("Belfast", start_date="2024-01-01", end_date="2024-01-31")

Northern Ireland House Price Index

from bolster.data_sources.ni_house_price_index import (
    get_hpi_trends,
    get_sales_volumes,
    get_average_prices,
)

# Get HPI index trends over time (Q1 2005 - present)
hpi = get_hpi_trends()
print(hpi[["Period", "NI House Price Index", "Annual Change"]].tail())

# Get property sales volumes by type
sales = get_sales_volumes()
print(f"Total sales in latest quarter: {sales.iloc[-1]['Total']:,}")

# Get average sale prices
prices = get_average_prices()
print(f"Current median price: Β£{prices.iloc[-1]['Simple Median']:,.0f}")

NISRA Statistics

Comprehensive access to Northern Ireland Statistics and Research Agency (NISRA) data:

from bolster.data_sources.nisra import population, births, deaths, migration

# Mid-year population estimates by geography and demographics
pop_df = population.get_latest_population()
print(f"NI Population: {pop_df['population'].sum():,}")

# Monthly birth registrations
births_df = births.get_latest_births()

# Weekly death registrations with excess deaths analysis
deaths_df = deaths.get_latest_deaths()

# Migration estimates derived from demographic components
migration_df = migration.get_latest_migration()

Additional NISRA modules: labour_market, index_of_production, index_of_services, construction_output, composite_index, marriages, ashe (earnings survey), quarterly_employment_survey, emergency_care_waiting_times, stillbirths.

See NISRA module documentation for full API reference.

NISRA RSS Feed Coverage

The GOV.UK NISRA statistics RSS feed tracks new NISRA publications. Current implementation status:

Publication

Module

Status

Claimant Count (UC + JSA)

nisra.claimant_count

βœ…

Labour Market Statistics

nisra.labour_market

βœ…

Weekly/Monthly Deaths

nisra.deaths

βœ…

Monthly Births/Stillbirths

nisra.births

βœ…

Monthly Marriages & Civil Partnerships

nisra.marriages

βœ…

NI Composite Economic Index

nisra.composite_index

βœ…

Construction Bulletin

nisra.construction_output

βœ…

Index of Production

nisra.index_of_production

βœ…

Index of Services

nisra.index_of_services

βœ…

Quarterly Employment Survey

nisra.quarterly_employment_survey

βœ…

Emergency Care Waiting Times

nisra.emergency_care_waiting_times

βœ…

Elective/Outpatient Waiting Times

nisra.elective_waiting_times

βœ…

Monthly Stillbirths

nisra.stillbirths

βœ…

Population Estimates

nisra.population

βœ…

Migration Estimates (Derived + Official LTI)

nisra.migration

βœ…

Population Projections (NI-level, biennial vintage)

nisra.population_projections

βœ…

Population Projections β€” LGD sub-areas (2022-based, 2022–2047)

nisra.population_projections

βœ…

Annual Survey of Hours & Earnings

nisra.ashe

βœ…

DVA Monthly Tests Statistics

dva

βœ…

UK Gender Pay Gap Reporting

gender_pay_gap

βœ…

Individual Wellbeing

nisra.wellbeing

βœ…

Cancer Waiting Times

nisra.cancer_waiting_times

βœ…

Child Protection Statistics

nisra.child_protection

βœ…

NI Planning Activity Statistics (DfI)

nisra.planning_statistics

βœ…

Registrar General Quarterly Tables

nisra.registrar_general

βœ…

Tourism - Hotel Occupancy

nisra.tourism.occupancy

βœ…

Tourism - SSA Occupancy

nisra.tourism.occupancy

βœ…

Tourism - Visitor Statistics

nisra.tourism.visitor_statistics

βœ…

Baby Names NI (annual, 1997–present)

nisra.baby_names

βœ…

NI School Suspensions (DE)

education_suspensions

βœ…

Work Quality NI (NISRA)

nisra.work_quality

βœ…

NI LAC Municipal Waste Statistics (DAERA)

daera_waste

βœ…

NI Claimant Count (UC + JSA, DfC/ONS)

nisra.claimant_count

βœ…

PSNI Police Ombudsman Complaints

psni.police_ombudsman

βœ…

Public Confidence in Official Statistics (NISRA PCOS)

nisra.public_confidence

βœ…

Disease Prevalence Registers (PHA/DoH)

nisra.disease_prevalence

βœ…

PSNI Stop & Search (OpenDataNI)

psni.stop_and_search

βœ…

PSNI PACE Stop & Search / Arrests

psni.pace

βœ…

Security Situation Statistics

-

❌ Cloudflare-blocked

Anti-social Behaviour

-

❌ Cloudflare-blocked

Domestic Abuse Incidents/Crimes

-

❌ Cloudflare-blocked

Drug Seizures & Arrests

-

❌ Cloudflare-blocked

Hate Incidents & Crimes

-

❌ Cloudflare-blocked

Road Traffic Collisions

psni.road_traffic_collisions

βœ…

PSNI Crime Statistics

psni.crime_statistics

⚠️ historical only (Apr 2001–Dec 2021); get_latest raises PSNIDataStaleError

Police Ombudsman Complaints

psni.police_ombudsman

βœ…

Stop & Search

psni.stop_and_search

βœ…

PACE Stop & Search / Arrests

psni.pace

βœ…

Infrastructure NI Publication Discovery

The Infrastructure NI publications portal provides advanced filtering capabilities beyond basic publication types. Analysis of the sidebar filtering system reveals additional organizational dimensions that could enhance data source discovery:

Next Steps Analysis Directions:

  • Topic categorization: Publications span transport, environment, planning, and infrastructure domains

  • Geographic filtering: Regional breakdown capabilities for localized analysis

  • Date range analysis: Historical publication patterns and frequency tracking

  • Document format analysis: Structured data availability vs. narrative reports

  • Cross-departmental integration: Links with other NI government department publications

This systematic analysis could identify gaps in current DVA coverage and reveal additional structured datasets suitable for bolster integration.

☁️ Cloud Services

AWS Integration

from bolster.aws import get_session, S3Handler, DynamoHandler

# Get configured AWS session
session = get_session(profile="production")

# S3 operations with best practices
s3 = S3Handler(session)
s3.upload_file("local_file.txt", "bucket-name", "remote/path/file.txt")

# DynamoDB operations
dynamo = DynamoHandler(session)
items = dynamo.scan_table("user-data", filters={"status": "active"})

Azure Integration

from bolster.azure import AzureHandler

# Azure Blob Storage operations
azure = AzureHandler(connection_string="DefaultEndpointsProtocol=https;...")
azure.upload_blob("container", "blob_name", data)

🌐 Web Scraping & HTTP

from bolster.web import safe_request, parse_html_table

# Robust HTTP requests with automatic retries
response = safe_request("https://api.example.com/data", max_retries=3, timeout=30)

# Parse HTML tables into pandas DataFrames
tables = parse_html_table("https://example.com/tables")
print(tables[0].head())  # First table as DataFrame

πŸ–₯️ Command Line Interface

Bolster includes a CLI for common operations:

# Get precipitation data
bolster get-precipitation --location "Belfast" --start-date "2024-01-01"

# Get help on available commands
bolster --help

πŸ”§ Advanced Examples

Concurrent Data Processing

import bolster
from datetime import datetime


# Process large datasets with progress tracking
def process_user_data(user_id):
    # Simulate data processing
    return {"user_id": user_id, "processed_at": datetime.now()}


user_ids = range(1000)  # 1000 users to process

# Process with automatic progress bar and error handling
results = bolster.poolmap(
    process_user_data,
    user_ids,
    max_workers=10,
    progress=True,  # Shows progress bar
)

print(f"Processed {len(results)} users successfully")

Smart Caching and Memoization

class DataProcessor:
    @bolster.memoize
    def expensive_calculation(self, data_hash):
        # Expensive operation that we want to cache
        import time

        time.sleep(2)  # Simulate expensive operation
        return f"Processed: {data_hash}"


processor = DataProcessor()

# First call - takes 2 seconds
result1 = processor.expensive_calculation("abc123")

# Second call with same input - returns immediately from cache
result2 = processor.expensive_calculation("abc123")

# Check cache performance
print(f"Cache hits: {len(processor._memoize__hits)}")
print(f"Cache misses: {len(processor._memoize__misses)}")

Robust API Integration with Backoff

import requests
import bolster


@bolster.backoff((requests.RequestException, ConnectionError), tries=5, delay=1, backoff=2)
def fetch_api_data(url):
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()


# This will automatically retry with exponential backoff on failure
data = fetch_api_data("https://api.unreliable-service.com/data")

Complex Data Transformation

# Transform API response to database format
api_response = {
    "user_name": "john_doe",
    "user_email": "john@example.com",
    "account_type": "premium",
    "signup_timestamp": "2024-01-01T12:00:00Z",
}

# Define transformation rules
rules = {
    "user_name": ("username", str.upper),  # Rename and transform
    "user_email": ("email", None),  # Keep as-is but rename
    "account_type": ("tier", lambda x: x.title()),  # Transform value
    "signup_timestamp": ("created_at", bolster.parse_iso_datetime),
}

# Apply transformation
db_record = bolster.transform_(api_response, rules)
print(db_record)
# {'username': 'JOHN_DOE', 'email': 'john@example.com',
#  'tier': 'Premium', 'created_at': datetime(2024, 1, 1, 12, 0, 0)}

πŸ—οΈ Development Setup

Prerequisites

  • Python 3.9+ (3.10, 3.11, 3.12, 3.13 supported)

  • uv (fast Python package manager)

Installation for Development

# Clone the repository
git clone https://github.com/andrewbolster/bolster.git
cd bolster

# Install with development dependencies
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=bolster --cov-report=html

# Build documentation
cd docs && uv run make html

Running Tests

# Run all tests
uv run pytest

# Run with verbose output and coverage
uv run pytest -v --cov=bolster --cov-report=term-missing

# Run specific test file
uv run pytest tests/test_core_utilities.py

# Skip network-dependent tests (useful if SSL issues)
uv run pytest -m "not network"

πŸ“š Documentation

  • Full Documentation: https://bolster.readthedocs.io

  • API Reference: Auto-generated from docstrings

  • Examples: See /notebooks directory for Jupyter notebook examples

  • Data Sources: Detailed documentation for each data source module

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

  1. Testing: Ensure all new features have comprehensive tests

  2. Documentation: Add docstrings and update README for new features

  3. Code Style: Follow the existing code style (enforced by ruff)

  4. Type Hints: Include type annotations for all public functions

  5. Performance: Consider performance implications for data processing functions

πŸ“„ License

This project is licensed under the GNU General Public License v3 (GPLv3) - see the LICENSE file for details.

πŸ› Bug Reports

If you encounter any bugs or issues, please file a bug report at: https://github.com/andrewbolster/bolster/issues