Bolsterο
Bolsterο
Bolsterβs Brain, youβve been warned π§
A comprehensive Python utility library for data science, web scraping, cloud services, and general development workflows. Originally designed as a personal toolkit, Bolster has evolved into a robust collection of utilities that enhance productivity across data analysis, system administration, and software development tasks.
π Quick Startο
Installationο
pip install bolster
Basic Usageο
import bolster
# Efficient data processing with built-in progress tracking
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
results = bolster.poolmap(lambda x: x**2, data)
print(results) # {1: 1, 2: 4, 3: 9, 4: 16, ...}
# Smart retry logic with exponential backoff
@bolster.backoff(Exception, tries=3, delay=1, backoff=2)
def unreliable_api_call():
# Your potentially failing code here
return "Success!"
# Efficient tree/dict navigation
nested_data = {
"users": {
"active": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}],
"inactive": [{"name": "Charlie", "age": 35}],
}
}
# Find all ages recursively
ages = bolster.get_recursively(nested_data, "age")
print(ages) # [25, 30, 35]
# Flatten nested structures
flat = bolster.flatten_dict(nested_data)
print(flat["users:active:0:name"]) # 'Alice'
π― Core Featuresο
Concurrency & Performanceο
poolmap(): ThreadPoolExecutor wrapper with progress monitoring and robust error handlingexceptional_executor(): Graceful handling of failed futures in concurrent operationsbackoff(): Exponential backoff retry decorator for unreliable operationsmemoize(): Instance method caching with hit/miss tracking for performance optimization
Data Processing & Transformationο
aggregate(): Pandas-like groupby operations for dictionaries and liststransform_(): Flexible data transformation with key mapping and function applicationbatch()/chunks(): Efficient sequence partitioning for processing large datasetsCompression utilities:
compress_for_relay()/decompress_from_relay()for data serialization
Development & Debuggingο
arg_exception_logger(): Decorator for debugging function calls with automatic argument loggingMultipleErrors: Accumulate and handle multiple exceptions in complex workflowsworking_directory(): Context manager for safe directory operationspretty_print_request(): HTTP request debugging with automatic auth redaction
π Data Sourcesο
Bolster includes specialized modules for working with Northern Ireland and UK data sources:
Northern Ireland Water Qualityο
from bolster.data_sources.ni_water import get_water_quality, get_water_quality_by_zone
# Get comprehensive water quality data for all NI supply zones
df = get_water_quality()
print(df.shape) # Shows number of zones and parameters
# Get specific zone data
zone_data = get_water_quality_by_zone("BALM") # Belfast Malone area
print(f"Hardness: {zone_data['NI Hardness Classification']}")
Electoral Office for Northern Ireland (EONI)ο
from bolster.data_sources.eoni import get_election_results
# Get Assembly election results
results_2016 = get_election_results(2016)
results_2022 = get_election_results(2022)
# Compare party performance across elections
comparison = bolster.diff(results_2022, results_2016)
Companies House Dataο
from bolster.data_sources.companies_house import search_companies, get_company_details
# Search for companies
results = search_companies("Technology")
# Get detailed company information
company = get_company_details("12345678") # Company number
print(f"{company['name']} - Status: {company['status']}")
UK Met Officeο
from bolster.data_sources.metoffice import get_precipitation_data
# Get weather data for a specific location
weather = get_precipitation_data("Belfast", start_date="2024-01-01", end_date="2024-01-31")
Northern Ireland House Price Indexο
from bolster.data_sources.ni_house_price_index import (
get_hpi_trends,
get_sales_volumes,
get_average_prices,
)
# Get HPI index trends over time (Q1 2005 - present)
hpi = get_hpi_trends()
print(hpi[["Period", "NI House Price Index", "Annual Change"]].tail())
# Get property sales volumes by type
sales = get_sales_volumes()
print(f"Total sales in latest quarter: {sales.iloc[-1]['Total']:,}")
# Get average sale prices
prices = get_average_prices()
print(f"Current median price: Β£{prices.iloc[-1]['Simple Median']:,.0f}")
NISRA Statisticsο
Comprehensive access to Northern Ireland Statistics and Research Agency (NISRA) data:
from bolster.data_sources.nisra import population, births, deaths, migration
# Mid-year population estimates by geography and demographics
pop_df = population.get_latest_population()
print(f"NI Population: {pop_df['population'].sum():,}")
# Monthly birth registrations
births_df = births.get_latest_births()
# Weekly death registrations with excess deaths analysis
deaths_df = deaths.get_latest_deaths()
# Migration estimates derived from demographic components
migration_df = migration.get_latest_migration()
Additional NISRA modules: labour_market, index_of_production, index_of_services, construction_output, composite_index, marriages, ashe (earnings survey), quarterly_employment_survey, emergency_care_waiting_times, stillbirths.
See NISRA module documentation for full API reference.
NISRA RSS Feed Coverageο
The GOV.UK NISRA statistics RSS feed tracks new NISRA publications. Current implementation status:
Publication |
Module |
Status |
|---|---|---|
Claimant Count (UC + JSA) |
|
β |
Labour Market Statistics |
|
β |
Weekly/Monthly Deaths |
|
β |
Monthly Births/Stillbirths |
|
β |
Monthly Marriages & Civil Partnerships |
|
β |
NI Composite Economic Index |
|
β |
Construction Bulletin |
|
β |
Index of Production |
|
β |
Index of Services |
|
β |
Quarterly Employment Survey |
|
β |
Emergency Care Waiting Times |
|
β |
Elective/Outpatient Waiting Times |
|
β |
Monthly Stillbirths |
|
β |
Population Estimates |
|
β |
Migration Estimates (Derived + Official LTI) |
|
β |
Population Projections (NI-level, biennial vintage) |
|
β |
Population Projections β LGD sub-areas (2022-based, 2022β2047) |
|
β |
Annual Survey of Hours & Earnings |
|
β |
DVA Monthly Tests Statistics |
|
β |
UK Gender Pay Gap Reporting |
|
β |
Individual Wellbeing |
|
β |
Cancer Waiting Times |
|
β |
Child Protection Statistics |
|
β |
NI Planning Activity Statistics (DfI) |
|
β |
Registrar General Quarterly Tables |
|
β |
Tourism - Hotel Occupancy |
|
β |
Tourism - SSA Occupancy |
|
β |
Tourism - Visitor Statistics |
|
β |
Baby Names NI (annual, 1997βpresent) |
|
β |
NI School Suspensions (DE) |
|
β |
Work Quality NI (NISRA) |
|
β |
NI LAC Municipal Waste Statistics (DAERA) |
|
β |
NI Claimant Count (UC + JSA, DfC/ONS) |
|
β |
PSNI Police Ombudsman Complaints |
|
β |
Public Confidence in Official Statistics (NISRA PCOS) |
|
β |
Disease Prevalence Registers (PHA/DoH) |
|
β |
PSNI Stop & Search (OpenDataNI) |
|
β |
PSNI PACE Stop & Search / Arrests |
|
β |
Security Situation Statistics |
- |
β Cloudflare-blocked |
Anti-social Behaviour |
- |
β Cloudflare-blocked |
Domestic Abuse Incidents/Crimes |
- |
β Cloudflare-blocked |
Drug Seizures & Arrests |
- |
β Cloudflare-blocked |
Hate Incidents & Crimes |
- |
β Cloudflare-blocked |
Road Traffic Collisions |
|
β |
PSNI Crime Statistics |
|
β οΈ historical only (Apr 2001βDec 2021); |
Police Ombudsman Complaints |
|
β |
Stop & Search |
|
β |
PACE Stop & Search / Arrests |
|
β |
Infrastructure NI Publication Discoveryο
The Infrastructure NI publications portal provides advanced filtering capabilities beyond basic publication types. Analysis of the sidebar filtering system reveals additional organizational dimensions that could enhance data source discovery:
Next Steps Analysis Directions:
Topic categorization: Publications span transport, environment, planning, and infrastructure domains
Geographic filtering: Regional breakdown capabilities for localized analysis
Date range analysis: Historical publication patterns and frequency tracking
Document format analysis: Structured data availability vs. narrative reports
Cross-departmental integration: Links with other NI government department publications
This systematic analysis could identify gaps in current DVA coverage and reveal additional structured datasets suitable for bolster integration.
βοΈ Cloud Servicesο
AWS Integrationο
from bolster.aws import get_session, S3Handler, DynamoHandler
# Get configured AWS session
session = get_session(profile="production")
# S3 operations with best practices
s3 = S3Handler(session)
s3.upload_file("local_file.txt", "bucket-name", "remote/path/file.txt")
# DynamoDB operations
dynamo = DynamoHandler(session)
items = dynamo.scan_table("user-data", filters={"status": "active"})
Azure Integrationο
from bolster.azure import AzureHandler
# Azure Blob Storage operations
azure = AzureHandler(connection_string="DefaultEndpointsProtocol=https;...")
azure.upload_blob("container", "blob_name", data)
π Web Scraping & HTTPο
from bolster.web import safe_request, parse_html_table
# Robust HTTP requests with automatic retries
response = safe_request("https://api.example.com/data", max_retries=3, timeout=30)
# Parse HTML tables into pandas DataFrames
tables = parse_html_table("https://example.com/tables")
print(tables[0].head()) # First table as DataFrame
π₯οΈ Command Line Interfaceο
Bolster includes a CLI for common operations:
# Get precipitation data
bolster get-precipitation --location "Belfast" --start-date "2024-01-01"
# Get help on available commands
bolster --help
π§ Advanced Examplesο
Concurrent Data Processingο
import bolster
from datetime import datetime
# Process large datasets with progress tracking
def process_user_data(user_id):
# Simulate data processing
return {"user_id": user_id, "processed_at": datetime.now()}
user_ids = range(1000) # 1000 users to process
# Process with automatic progress bar and error handling
results = bolster.poolmap(
process_user_data,
user_ids,
max_workers=10,
progress=True, # Shows progress bar
)
print(f"Processed {len(results)} users successfully")
Smart Caching and Memoizationο
class DataProcessor:
@bolster.memoize
def expensive_calculation(self, data_hash):
# Expensive operation that we want to cache
import time
time.sleep(2) # Simulate expensive operation
return f"Processed: {data_hash}"
processor = DataProcessor()
# First call - takes 2 seconds
result1 = processor.expensive_calculation("abc123")
# Second call with same input - returns immediately from cache
result2 = processor.expensive_calculation("abc123")
# Check cache performance
print(f"Cache hits: {len(processor._memoize__hits)}")
print(f"Cache misses: {len(processor._memoize__misses)}")
Robust API Integration with Backoffο
import requests
import bolster
@bolster.backoff((requests.RequestException, ConnectionError), tries=5, delay=1, backoff=2)
def fetch_api_data(url):
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.json()
# This will automatically retry with exponential backoff on failure
data = fetch_api_data("https://api.unreliable-service.com/data")
Complex Data Transformationο
# Transform API response to database format
api_response = {
"user_name": "john_doe",
"user_email": "john@example.com",
"account_type": "premium",
"signup_timestamp": "2024-01-01T12:00:00Z",
}
# Define transformation rules
rules = {
"user_name": ("username", str.upper), # Rename and transform
"user_email": ("email", None), # Keep as-is but rename
"account_type": ("tier", lambda x: x.title()), # Transform value
"signup_timestamp": ("created_at", bolster.parse_iso_datetime),
}
# Apply transformation
db_record = bolster.transform_(api_response, rules)
print(db_record)
# {'username': 'JOHN_DOE', 'email': 'john@example.com',
# 'tier': 'Premium', 'created_at': datetime(2024, 1, 1, 12, 0, 0)}
ποΈ Development Setupο
Prerequisitesο
Python 3.9+ (3.10, 3.11, 3.12, 3.13 supported)
uv (fast Python package manager)
Installation for Developmentο
# Clone the repository
git clone https://github.com/andrewbolster/bolster.git
cd bolster
# Install with development dependencies
uv sync --all-extras --dev
# Install pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=bolster --cov-report=html
# Build documentation
cd docs && uv run make html
Running Testsο
# Run all tests
uv run pytest
# Run with verbose output and coverage
uv run pytest -v --cov=bolster --cov-report=term-missing
# Run specific test file
uv run pytest tests/test_core_utilities.py
# Skip network-dependent tests (useful if SSL issues)
uv run pytest -m "not network"
π Documentationο
Full Documentation: https://bolster.readthedocs.io
API Reference: Auto-generated from docstrings
Examples: See
/notebooksdirectory for Jupyter notebook examplesData Sources: Detailed documentation for each data source module
π€ Contributingο
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Guidelinesο
Testing: Ensure all new features have comprehensive tests
Documentation: Add docstrings and update README for new features
Code Style: Follow the existing code style (enforced by ruff)
Type Hints: Include type annotations for all public functions
Performance: Consider performance implications for data processing functions
π Licenseο
This project is licensed under the GNU General Public License v3 (GPLv3) - see the LICENSE file for details.
π Bug Reportsο
If you encounter any bugs or issues, please file a bug report at: https://github.com/andrewbolster/bolster/issues
π Linksο
Documentation: https://bolster.readthedocs.io
Author: Andrew Bolster
Built with β€οΈ for data science, automation, and general productivity enhancement.