bolster.utils.cache

File caching utilities for data sources.

Provides disk-based caching for downloaded files with configurable TTL. Used by NISRA, PSNI, and other data source modules to avoid repeated downloads of the same resources.

Cache Location:

Files are cached in ~/.cache/bolster/<namespace>/ with filenames based on URL hashes. Each data source uses its own namespace.

Example

>>> from bolster.utils.cache import CachedDownloader, hash_url
>>> hash_url("https://example.com/data.csv")
'2a01ab0de708440185cbb6473893860c'
>>> downloader = CachedDownloader("my_source")
>>> downloader.namespace
'my_source'

Attributes

logger

CACHE_BASE

Exceptions

CacheError

Base exception for cache operations.

DownloadError

Raised when a file download fails.

Classes

CachedDownloader

Disk-based file cache with TTL support.

Functions

hash_url(url)

Generate a cache-safe filename from a URL using MD5 hash.

Module Contents

bolster.utils.cache.logger[source]
bolster.utils.cache.CACHE_BASE[source]
exception bolster.utils.cache.CacheError[source]

Bases: Exception

Base exception for cache operations.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.utils.cache.DownloadError[source]

Bases: CacheError

Raised when a file download fails.

Initialize self. See help(type(self)) for accurate signature.

bolster.utils.cache.hash_url(url)[source]

Generate a cache-safe filename from a URL using MD5 hash.

Parameters:

url (str) – The URL to hash

Returns:

32-character hexadecimal MD5 hash string

Return type:

str

Example

>>> hash_url("https://example.com/data.csv")
'2a01ab0de708440185cbb6473893860c'
class bolster.utils.cache.CachedDownloader(namespace, timeout=60)[source]

Disk-based file cache with TTL support.

Provides download-with-cache functionality for data source modules. Each instance uses a namespace subdirectory for isolation.

Parameters:
  • namespace (str) – Subdirectory name for this cache (e.g., “nisra”, “psni”)

  • timeout (int) – Request timeout in seconds (default: 60)

Example

>>> downloader = CachedDownloader("psni", timeout=60)
>>> downloader.namespace
'psni'
>>> downloader.timeout
60
>>> downloader.cache_dir.parts[-2:]
('bolster', 'psni')

Initialize CachedDownloader with namespace and timeout.

Parameters:
  • namespace (str) – Cache namespace for organizing files

  • timeout (int) – Timeout for HTTP requests in seconds

namespace[source]
timeout = 60[source]
cache_dir[source]
get_cached_file(url, cache_ttl_hours=24)[source]

Return cached file if it exists and is fresh, else None.

Parameters:
  • url (str) – URL of the file (used to generate cache filename)

  • cache_ttl_hours (int) – Maximum age in hours before cache is stale

Returns:

Path to cached file if valid and fresh, None otherwise

Return type:

pathlib.Path | None

download(url, cache_ttl_hours=24, force_refresh=False, headers=None)[source]

Download a file with caching support.

Downloads a file from the given URL and caches it locally. If a valid cached version exists, returns that instead.

Parameters:
  • url (str) – URL to download

  • cache_ttl_hours (int) – Cache validity in hours (default: 24)

  • force_refresh (bool) – If True, bypass cache and re-download

  • headers (dict | None) – Optional extra HTTP headers to include in the request (e.g. {"Referer": "...", "User-Agent": "..."})

Returns:

Path to the downloaded (or cached) file

Raises:

DownloadError – If download fails due to network or HTTP errors

Return type:

pathlib.Path

clear(pattern=None)[source]

Clear cached files.

Parameters:

pattern (str | None) – Optional glob pattern (e.g., *.csv). If None, clears all.

Returns:

Number of files deleted

Return type:

int