bolster.utils.cache =================== .. py:module:: bolster.utils.cache .. autoapi-nested-parse:: File caching utilities for data sources. Provides disk-based caching for downloaded files with configurable TTL. Used by NISRA, PSNI, and other data source modules to avoid repeated downloads of the same resources. Cache Location: Files are cached in ``~/.cache/bolster//`` with filenames based on URL hashes. Each data source uses its own namespace. .. rubric:: Example >>> from bolster.utils.cache import CachedDownloader, hash_url >>> hash_url("https://example.com/data.csv") '2a01ab0de708440185cbb6473893860c' >>> downloader = CachedDownloader("my_source") >>> downloader.namespace 'my_source' Attributes ---------- .. autoapisummary:: bolster.utils.cache.logger bolster.utils.cache.CACHE_BASE Exceptions ---------- .. autoapisummary:: bolster.utils.cache.CacheError bolster.utils.cache.DownloadError Classes ------- .. autoapisummary:: bolster.utils.cache.CachedDownloader Functions --------- .. autoapisummary:: bolster.utils.cache.hash_url Module Contents --------------- .. py:data:: logger .. py:data:: CACHE_BASE .. py:exception:: CacheError Bases: :py:obj:`Exception` Base exception for cache operations. Initialize self. See help(type(self)) for accurate signature. .. py:exception:: DownloadError Bases: :py:obj:`CacheError` Raised when a file download fails. Initialize self. See help(type(self)) for accurate signature. .. py:function:: hash_url(url) Generate a cache-safe filename from a URL using MD5 hash. :param url: The URL to hash :returns: 32-character hexadecimal MD5 hash string .. rubric:: Example >>> hash_url("https://example.com/data.csv") '2a01ab0de708440185cbb6473893860c' .. py:class:: CachedDownloader(namespace, timeout = 60) Disk-based file cache with TTL support. Provides download-with-cache functionality for data source modules. Each instance uses a namespace subdirectory for isolation. :param namespace: Subdirectory name for this cache (e.g., "nisra", "psni") :param timeout: Request timeout in seconds (default: 60) .. rubric:: Example >>> downloader = CachedDownloader("psni", timeout=60) >>> downloader.namespace 'psni' >>> downloader.timeout 60 >>> downloader.cache_dir.parts[-2:] ('bolster', 'psni') Initialize CachedDownloader with namespace and timeout. :param namespace: Cache namespace for organizing files :param timeout: Timeout for HTTP requests in seconds .. py:attribute:: namespace .. py:attribute:: timeout :value: 60 .. py:attribute:: cache_dir .. py:method:: get_cached_file(url, cache_ttl_hours = 24) Return cached file if it exists and is fresh, else None. :param url: URL of the file (used to generate cache filename) :param cache_ttl_hours: Maximum age in hours before cache is stale :returns: Path to cached file if valid and fresh, None otherwise .. py:method:: download(url, cache_ttl_hours = 24, force_refresh = False, headers = None) Download a file with caching support. Downloads a file from the given URL and caches it locally. If a valid cached version exists, returns that instead. :param url: URL to download :param cache_ttl_hours: Cache validity in hours (default: 24) :param force_refresh: If True, bypass cache and re-download :param headers: Optional extra HTTP headers to include in the request (e.g. ``{"Referer": "...", "User-Agent": "..."}``) :returns: Path to the downloaded (or cached) file :raises DownloadError: If download fails due to network or HTTP errors .. py:method:: clear(pattern = None) Clear cached files. :param pattern: Optional glob pattern (e.g., ``*.csv``). If None, clears all. :returns: Number of files deleted