bolster.utils.cache

File caching utilities for data sources.

Provides disk-based caching for downloaded files with configurable TTL. Used by NISRA, PSNI, and other data source modules to avoid repeated downloads of the same resources.

Cache Location:: Files are cached in ~/.cache/bolster/<namespace>/ with filenames based on URL hashes. Each data source uses its own namespace.

Example

>>> from bolster.utils.cache import CachedDownloader, hash_url
>>> hash_url("https://example.com/data.csv")
'2a01ab0de708440185cbb6473893860c'
>>> downloader = CachedDownloader("my_source")
>>> downloader.namespace
'my_source'

Attributes

`logger`
`CACHE_BASE`
`hits`
`misses`

Exceptions

`CacheError`	Base exception for cache operations.
`DownloadError`	Raised when a file download fails.

Classes

CachedDownloader

Disk-based file cache with TTL support.

Functions

hash_url(url)

Generate a cache-safe filename from a URL using MD5 hash.

Module Contents

bolster.utils.cache.logger[source]

bolster.utils.cache.CACHE_BASE[source]

bolster.utils.cache.hits = 0[source]

bolster.utils.cache.misses = 0[source]

exception bolster.utils.cache.CacheError[source]

Bases: Exception

Base exception for cache operations.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.utils.cache.DownloadError[source]

Bases: CacheError

Raised when a file download fails.

Initialize self. See help(type(self)) for accurate signature.

bolster.utils.cache.hash_url(url)[source]

Generate a cache-safe filename from a URL using MD5 hash.

Parameters:: url (str) – The URL to hash
Returns:: 32-character hexadecimal MD5 hash string
Return type:: str

Example

>>> hash_url("https://example.com/data.csv")
'2a01ab0de708440185cbb6473893860c'

class bolster.utils.cache.CachedDownloader(namespace, timeout=60)[source]

Disk-based file cache with TTL support.

Provides download-with-cache functionality for data source modules. Each instance uses a namespace subdirectory for isolation.

Parameters:

namespace (str) – Subdirectory name for this cache (e.g., “nisra”, “psni”)
timeout (int) – Request timeout in seconds (default: 60)

Example

>>> downloader = CachedDownloader("psni", timeout=60)
>>> downloader.namespace
'psni'
>>> downloader.timeout
60
>>> downloader.cache_dir.parts[-2:]
('bolster', 'psni')

Initialize CachedDownloader with namespace and timeout.

Parameters:

namespace (str) – Cache namespace for organizing files
timeout (int) – Timeout for HTTP requests in seconds

namespace[source]

timeout = 60[source]

cache_dir[source]

get_cached_file(url, cache_ttl_hours=24)[source]

Return cached file if it exists and is fresh, else None.

Parameters:

url (str) – URL of the file (used to generate cache filename)
cache_ttl_hours (int) – Maximum age in hours before cache is stale

Returns:

Path to cached file if valid and fresh, None otherwise

Return type:

pathlib.Path | None

download(url, cache_ttl_hours=24, force_refresh=False, headers=None)[source]

Download a file with caching support.

Downloads a file from the given URL and caches it locally. If a valid cached version exists, returns that instead.

Parameters:

url (str) – URL to download
cache_ttl_hours (int) – Cache validity in hours (default: 24)
force_refresh (bool) – If True, bypass cache and re-download
headers (dict | None) – Optional extra HTTP headers to include in the request (e.g. {"Referer": "...", "User-Agent": "..."})

Returns:

Path to the downloaded (or cached) file

Raises:

DownloadError – If download fails due to network or HTTP errors

Return type:

pathlib.Path

clear(pattern=None)[source]

Clear cached files.

Parameters:: pattern (str | None) – Optional glob pattern (e.g., *.csv). If None, clears all.
Returns:: Number of files deleted
Return type:: int