bolster.data_sources.nisra.claimant_count
=========================================

.. py:module:: bolster.data_sources.nisra.claimant_count

.. autoapi-nested-parse::

   NISRA Claimant Count Statistics Module.

   This module provides access to the Northern Ireland Statistics and Research Agency (NISRA)
   monthly Claimant Count statistics, covering Universal Credit (UC) and Jobseeker's Allowance
   (JSA) claimants.

   The Claimant Count is an experimental statistic measuring the number of people claiming
   benefits principally for the reason of being unemployed. Data is published monthly and
   covers Northern Ireland with multiple geographic breakdowns including Local Government
   Districts, Parliamentary Constituency Areas, Travel-to-Work Areas, and Super Output Areas.

   Data Source:
       **Publication page pattern**:
       https://www.nisra.gov.uk/publications/labour-market-report-{month_name}-{year}

       The module scrapes the monthly Labour Market Report publication page to find the
       ``lmr-claimant-count-tables-*.xlsx`` Excel file link, falling back to direct URL
       construction if scraping fails.

   Update Frequency: Monthly, approximately 2–3 weeks after the reference month.

   Sheets parsed:
       - ``Headline``: NI total by sex, seasonally adjusted and non-seasonally adjusted,
         full time series from April 1997.
       - ``Age``: NI total by age band (16–24, 25–49, 50+), from January 2013.
       - ``LGD_11``: Current-month snapshot for 11 Local Government Districts.
       - ``PCA``: Current-month snapshot for 18 Westminster Parliamentary Constituency Areas.
       - ``TTWA``: Current-month snapshot for 10 Travel-to-Work Areas.
       - ``SOA``: 889 Super Output Areas, wide-format time series from October 2017,
         melted to long format.

   .. rubric:: Notes

   Claimant Count is an experimental statistic. The rate denominator is
   claimant count + workforce jobs. Five-week months are annotated ``[2]``,
   revised data with ``(r)``, provisional with ``(p)``. Annotation markers
   are stripped before date parsing.

   SOA data has a methodology break at January 2026 (transition from COA2011
   to DZ2021 geographies).

   Usage:
       >>> from bolster.data_sources.nisra import claimant_count
       >>> df = claimant_count.get_latest_claimant_count("headline")
       >>> "claimants_000s" in df.columns
       True

       >>> lgd_df = claimant_count.get_latest_claimant_count("lgd")
       >>> "claimants_total" in lgd_df.columns
       True

   .. rubric:: Example

   >>> from bolster.data_sources.nisra import claimant_count
   >>> df = claimant_count.get_latest_claimant_count("headline")
   >>> df[df["sex"] == "all_people"].sort_values("date").tail(1)["claimants_000s"].values[0] > 0
   True

   Author: Claude Code


Attributes
----------

.. autoapisummary::

   bolster.data_sources.nisra.claimant_count.logger


Functions
---------

.. autoapisummary::

   bolster.data_sources.nisra.claimant_count.get_latest_publication_url
   bolster.data_sources.nisra.claimant_count.parse_headline
   bolster.data_sources.nisra.claimant_count.parse_age
   bolster.data_sources.nisra.claimant_count.parse_geography
   bolster.data_sources.nisra.claimant_count.parse_soa
   bolster.data_sources.nisra.claimant_count.get_latest_claimant_count
   bolster.data_sources.nisra.claimant_count.validate_claimant_count


Module Contents
---------------

.. py:data:: logger

.. py:function:: get_latest_publication_url()

   Discover the URL of the most recent claimant count Excel file.

   Scrapes the NISRA Labour Market Report publication page for the current
   month, falling back to previous months if needed, then falls back to
   direct URL construction.

   :returns: Full URL to the latest claimant count Excel file.

   :raises NISRADataNotFoundError: If no publication can be found.

   .. rubric:: Example

   >>> url = get_latest_publication_url()
   >>> url.endswith(".xlsx")
   True


.. py:function:: parse_headline(file_path)

   Parse the Headline sheet: NI total claimant count by sex.

   The Headline sheet contains two side-by-side tables:
   - Table 1a: Seasonally adjusted claimant count by sex
   - Table 1b: Non-seasonally adjusted claimant count by sex

   Both tables share the same date column structure with men, women and
   all people counts (thousands) and rates.

   :param file_path: Path to the claimant count Excel file.

   :returns:     - ``date``: pandas Timestamp (monthly, day=1)
                 - ``adjusted``: ``"seasonally_adjusted"`` or ``"non_seasonally_adjusted"``
                 - ``sex``: ``"men"``, ``"women"``, or ``"all_people"``
                 - ``claimants_000s``: Claimant count in thousands (float)
                 - ``claimant_rate``: Claimant rate as percentage (float)
   :rtype: DataFrame with columns

   :raises NISRADataNotFoundError: If the Headline sheet is not found.

   .. rubric:: Example

   >>> df = parse_headline("/tmp/claimant_count.xlsx")
   >>> sorted(df["sex"].unique())
   ['all_people', 'men', 'women']
   >>> sorted(df["adjusted"].unique())
   ['non_seasonally_adjusted', 'seasonally_adjusted']


.. py:function:: parse_age(file_path)

   Parse the Age sheet: NI claimant count by age band.

   Contains a single table of non-seasonally adjusted claimant counts
   broken down into three age bands: 16–24, 25–49, 50+. Data runs from
   January 2013.

   :param file_path: Path to the claimant count Excel file.

   :returns:     - ``date``: pandas Timestamp (monthly, day=1)
                 - ``age_group``: One of ``"16-24"``, ``"25-49"``, ``"50+"``.
                 - ``claimants``: Claimant count (integer, rounded to nearest 5).
   :rtype: DataFrame with columns

   :raises NISRADataNotFoundError: If the Age sheet is not found.

   .. rubric:: Example

   >>> df = parse_age("/tmp/claimant_count.xlsx")
   >>> sorted(df["age_group"].unique())
   ['16-24', '25-49', '50+']


.. py:function:: parse_geography(file_path, sheet)

   Parse a geographic breakdown sheet (LGD_11, PCA, or TTWA).

   Each sheet contains a current-month snapshot with columns for:
   male/female/total claimant numbers, working-age rates, month and year
   changes.

   :param file_path: Path to the claimant count Excel file.
   :param sheet: Sheet name — one of ``"LGD_11"``, ``"PCA"``, or ``"TTWA"``.

   :returns:     - ``date``: pandas Timestamp (extracted from the Excel filename)
                 - ``geography``: Area name (e.g., ``"Belfast"``)
                 - ``geography_type``: Sheet type identifier (e.g., ``"LGD_11"``)
                 - ``claimants_male``: Number of male claimants (int)
                 - ``claimants_female``: Number of female claimants (int)
                 - ``claimants_total``: Total claimants (int)
                 - ``claimant_rate_male_pct``: Male working-age claimant rate (float)
                 - ``claimant_rate_female_pct``: Female working-age claimant rate (float)
                 - ``claimant_rate_total_pct``: Total working-age claimant rate (float)
                 - ``change_over_month_number``: Change vs previous month (int)
                 - ``change_over_year_number``: Change vs same month last year (int)
   :rtype: DataFrame with columns

   :raises NISRADataNotFoundError: If the requested sheet is not found.
   :raises ValueError: If sheet is not one of the supported values.

   .. rubric:: Example

   >>> df = parse_geography("/tmp/claimant_count.xlsx", "LGD_11")
   >>> len(df["geography"].unique()) >= 11
   True
   >>> "claimants_total" in df.columns
   True


.. py:function:: parse_soa(file_path)

   Parse the SOA sheet: Super Output Area time series.

   The SOA sheet is wide-format with 889 Super Output Areas as rows and
   monthly dates as columns from October 2017. This function melts it to
   long format.

   .. note::

      There is a methodology break at January 2026 where geography codes
      transition from COA2011 to DZ2021. Both series are included in the
      output.

   :param file_path: Path to the claimant count Excel file.

   :returns:     - ``soa_code``: Super Output Area code and name (e.g., ``"95AA01S1 : Aldergrove_1"``)
                 - ``date``: pandas Timestamp (monthly, day=1)
                 - ``claimants``: Claimant count (int, rounded to nearest 5)
   :rtype: DataFrame with columns

   :raises NISRADataNotFoundError: If the SOA sheet is not found.

   .. rubric:: Example

   >>> df = parse_soa("/tmp/claimant_count.xlsx")
   >>> "soa_code" in df.columns
   True
   >>> df["date"].min().year <= 2018
   True


.. py:function:: get_latest_claimant_count(breakdown = 'headline', force_refresh = False)

   Download and parse the latest NISRA claimant count data.

   Automatically discovers and downloads the most recent monthly publication,
   then returns the requested breakdown.

   :param breakdown: One of:
                     - ``"headline"`` — NI total by sex, SA and non-SA (default)
                     - ``"age"`` — NI total by age band (16–24, 25–49, 50+)
                     - ``"lgd"`` — 11 Local Government Districts (current month)
                     - ``"pca"`` — 18 Parliamentary Constituency Areas (current month)
                     - ``"ttwa"`` — 10 Travel-to-Work Areas (current month)
                     - ``"soa"`` — 889 Super Output Areas, long-format time series
   :param force_refresh: If ``True``, bypass cache and download fresh data.

   :returns: DataFrame for the requested breakdown. See individual ``parse_*``
             functions for column documentation.

   :raises ValueError: If ``breakdown`` is not a supported value.
   :raises NISRADataNotFoundError: If the data cannot be downloaded.

   .. rubric:: Example

   >>> df = get_latest_claimant_count("headline")
   >>> "claimants_000s" in df.columns
   True
   >>> df_lgd = get_latest_claimant_count("lgd")
   >>> len(df_lgd) >= 11
   True


.. py:function:: validate_claimant_count(df, breakdown)

   Validate the integrity of a claimant count DataFrame.

   Checks that required columns are present, values are in plausible
   ranges, and the DataFrame is non-empty.

   :param df: DataFrame returned by ``get_latest_claimant_count`` or a
              ``parse_*`` function.
   :param breakdown: The breakdown type that produced the DataFrame.
                     One of ``"headline"``, ``"age"``, ``"lgd"``, ``"pca"``,
                     ``"ttwa"``, ``"soa"``.

   :returns: ``True`` if validation passes, ``False`` otherwise.

   .. rubric:: Example

   >>> import pandas as pd
   >>> validate_claimant_count(pd.DataFrame(), "headline")
   False