bolster.utils.cache
===================

.. py:module:: bolster.utils.cache

.. autoapi-nested-parse::

   File caching utilities for data sources.

   Provides disk-based caching for downloaded files with configurable TTL.
   Used by NISRA, PSNI, and other data source modules to avoid repeated
   downloads of the same resources.

   Cache Location:
       Files are cached in ``~/.cache/bolster/<namespace>/`` with filenames
       based on URL hashes. Each data source uses its own namespace.

   .. rubric:: Example

   >>> from bolster.utils.cache import CachedDownloader, hash_url
   >>> hash_url("https://example.com/data.csv")
   '2a01ab0de708440185cbb6473893860c'
   >>> downloader = CachedDownloader("my_source")
   >>> downloader.namespace
   'my_source'


Attributes
----------

.. autoapisummary::

   bolster.utils.cache.logger
   bolster.utils.cache.CACHE_BASE


Exceptions
----------

.. autoapisummary::

   bolster.utils.cache.CacheError
   bolster.utils.cache.DownloadError


Classes
-------

.. autoapisummary::

   bolster.utils.cache.CachedDownloader


Functions
---------

.. autoapisummary::

   bolster.utils.cache.hash_url


Module Contents
---------------

.. py:data:: logger

.. py:data:: CACHE_BASE

.. py:exception:: CacheError

   Bases: :py:obj:`Exception`


   Base exception for cache operations.

   Initialize self.  See help(type(self)) for accurate signature.


.. py:exception:: DownloadError

   Bases: :py:obj:`CacheError`


   Raised when a file download fails.

   Initialize self.  See help(type(self)) for accurate signature.


.. py:function:: hash_url(url)

   Generate a cache-safe filename from a URL using MD5 hash.

   :param url: The URL to hash

   :returns: 32-character hexadecimal MD5 hash string

   .. rubric:: Example

   >>> hash_url("https://example.com/data.csv")
   '2a01ab0de708440185cbb6473893860c'


.. py:class:: CachedDownloader(namespace, timeout = 60)

   Disk-based file cache with TTL support.

   Provides download-with-cache functionality for data source modules.
   Each instance uses a namespace subdirectory for isolation.

   :param namespace: Subdirectory name for this cache (e.g., "nisra", "psni")
   :param timeout: Request timeout in seconds (default: 60)

   .. rubric:: Example

   >>> downloader = CachedDownloader("psni", timeout=60)
   >>> downloader.namespace
   'psni'
   >>> downloader.timeout
   60
   >>> downloader.cache_dir.parts[-2:]
   ('bolster', 'psni')

   Initialize CachedDownloader with namespace and timeout.

   :param namespace: Cache namespace for organizing files
   :param timeout: Timeout for HTTP requests in seconds


   .. py:attribute:: namespace


   .. py:attribute:: timeout
      :value: 60


   .. py:attribute:: cache_dir


   .. py:method:: get_cached_file(url, cache_ttl_hours = 24)

      Return cached file if it exists and is fresh, else None.

      :param url: URL of the file (used to generate cache filename)
      :param cache_ttl_hours: Maximum age in hours before cache is stale

      :returns: Path to cached file if valid and fresh, None otherwise


   .. py:method:: download(url, cache_ttl_hours = 24, force_refresh = False, headers = None)

      Download a file with caching support.

      Downloads a file from the given URL and caches it locally. If a valid
      cached version exists, returns that instead.

      :param url: URL to download
      :param cache_ttl_hours: Cache validity in hours (default: 24)
      :param force_refresh: If True, bypass cache and re-download
      :param headers: Optional extra HTTP headers to include in the request
                      (e.g. ``{"Referer": "...", "User-Agent": "..."}``)

      :returns: Path to the downloaded (or cached) file

      :raises DownloadError: If download fails due to network or HTTP errors


   .. py:method:: clear(pattern = None)

      Clear cached files.

      :param pattern: Optional glob pattern (e.g., ``*.csv``). If None, clears all.

      :returns: Number of files deleted