bolster.utils.cache
File caching utilities for data sources.
Provides disk-based caching for downloaded files with configurable TTL. Used by NISRA, PSNI, and other data source modules to avoid repeated downloads of the same resources.
- Cache Location:
Files are cached in
~/.cache/bolster/<namespace>/with filenames based on URL hashes. Each data source uses its own namespace.
Example
>>> from bolster.utils.cache import CachedDownloader, hash_url
>>> hash_url("https://example.com/data.csv")
'2a01ab0de708440185cbb6473893860c'
>>> downloader = CachedDownloader("my_source")
>>> downloader.namespace
'my_source'
Attributes
Exceptions
Base exception for cache operations. |
|
Raised when a file download fails. |
Classes
Disk-based file cache with TTL support. |
Functions
|
Generate a cache-safe filename from a URL using MD5 hash. |
Module Contents
- exception bolster.utils.cache.CacheError[source]
Bases:
ExceptionBase exception for cache operations.
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.utils.cache.DownloadError[source]
Bases:
CacheErrorRaised when a file download fails.
Initialize self. See help(type(self)) for accurate signature.
- bolster.utils.cache.hash_url(url)[source]
Generate a cache-safe filename from a URL using MD5 hash.
- Parameters:
url (str) – The URL to hash
- Returns:
32-character hexadecimal MD5 hash string
- Return type:
Example
>>> hash_url("https://example.com/data.csv") '2a01ab0de708440185cbb6473893860c'
- class bolster.utils.cache.CachedDownloader(namespace, timeout=60)[source]
Disk-based file cache with TTL support.
Provides download-with-cache functionality for data source modules. Each instance uses a namespace subdirectory for isolation.
- Parameters:
Example
>>> downloader = CachedDownloader("psni", timeout=60) >>> downloader.namespace 'psni' >>> downloader.timeout 60 >>> downloader.cache_dir.parts[-2:] ('bolster', 'psni')
Initialize CachedDownloader with namespace and timeout.
- Parameters:
- get_cached_file(url, cache_ttl_hours=24)[source]
Return cached file if it exists and is fresh, else None.
- Parameters:
- Returns:
Path to cached file if valid and fresh, None otherwise
- Return type:
pathlib.Path | None
- download(url, cache_ttl_hours=24, force_refresh=False, headers=None)[source]
Download a file with caching support.
Downloads a file from the given URL and caches it locally. If a valid cached version exists, returns that instead.
- Parameters:
- Returns:
Path to the downloaded (or cached) file
- Raises:
DownloadError – If download fails due to network or HTTP errors
- Return type: