bolster.data_sources.nisra.claimant_count ========================================= .. py:module:: bolster.data_sources.nisra.claimant_count .. autoapi-nested-parse:: NISRA Claimant Count Statistics Module. This module provides access to the Northern Ireland Statistics and Research Agency (NISRA) monthly Claimant Count statistics, covering Universal Credit (UC) and Jobseeker's Allowance (JSA) claimants. The Claimant Count is an experimental statistic measuring the number of people claiming benefits principally for the reason of being unemployed. Data is published monthly and covers Northern Ireland with multiple geographic breakdowns including Local Government Districts, Parliamentary Constituency Areas, Travel-to-Work Areas, and Super Output Areas. Data Source: **Publication page pattern**: https://www.nisra.gov.uk/publications/labour-market-report-{month_name}-{year} The module scrapes the monthly Labour Market Report publication page to find the ``lmr-claimant-count-tables-*.xlsx`` Excel file link, falling back to direct URL construction if scraping fails. Update Frequency: Monthly, approximately 2–3 weeks after the reference month. Sheets parsed: - ``Headline``: NI total by sex, seasonally adjusted and non-seasonally adjusted, full time series from April 1997. - ``Age``: NI total by age band (16–24, 25–49, 50+), from January 2013. - ``LGD_11``: Current-month snapshot for 11 Local Government Districts. - ``PCA``: Current-month snapshot for 18 Westminster Parliamentary Constituency Areas. - ``TTWA``: Current-month snapshot for 10 Travel-to-Work Areas. - ``SOA``: 889 Super Output Areas, wide-format time series from October 2017, melted to long format. .. rubric:: Notes Claimant Count is an experimental statistic. The rate denominator is claimant count + workforce jobs. Five-week months are annotated ``[2]``, revised data with ``(r)``, provisional with ``(p)``. Annotation markers are stripped before date parsing. SOA data has a methodology break at January 2026 (transition from COA2011 to DZ2021 geographies). Usage: >>> from bolster.data_sources.nisra import claimant_count >>> df = claimant_count.get_latest_claimant_count("headline") >>> "claimants_000s" in df.columns True >>> lgd_df = claimant_count.get_latest_claimant_count("lgd") >>> "claimants_total" in lgd_df.columns True .. rubric:: Example >>> from bolster.data_sources.nisra import claimant_count >>> df = claimant_count.get_latest_claimant_count("headline") >>> df[df["sex"] == "all_people"].sort_values("date").tail(1)["claimants_000s"].values[0] > 0 True Author: Claude Code Attributes ---------- .. autoapisummary:: bolster.data_sources.nisra.claimant_count.logger Functions --------- .. autoapisummary:: bolster.data_sources.nisra.claimant_count.get_latest_publication_url bolster.data_sources.nisra.claimant_count.parse_headline bolster.data_sources.nisra.claimant_count.parse_age bolster.data_sources.nisra.claimant_count.parse_geography bolster.data_sources.nisra.claimant_count.parse_soa bolster.data_sources.nisra.claimant_count.get_latest_claimant_count bolster.data_sources.nisra.claimant_count.validate_claimant_count Module Contents --------------- .. py:data:: logger .. py:function:: get_latest_publication_url() Discover the URL of the most recent claimant count Excel file. Scrapes the NISRA Labour Market Report publication page for the current month, falling back to previous months if needed, then falls back to direct URL construction. :returns: Full URL to the latest claimant count Excel file. :raises NISRADataNotFoundError: If no publication can be found. .. rubric:: Example >>> url = get_latest_publication_url() >>> url.endswith(".xlsx") True .. py:function:: parse_headline(file_path) Parse the Headline sheet: NI total claimant count by sex. The Headline sheet contains two side-by-side tables: - Table 1a: Seasonally adjusted claimant count by sex - Table 1b: Non-seasonally adjusted claimant count by sex Both tables share the same date column structure with men, women and all people counts (thousands) and rates. :param file_path: Path to the claimant count Excel file. :returns: - ``date``: pandas Timestamp (monthly, day=1) - ``adjusted``: ``"seasonally_adjusted"`` or ``"non_seasonally_adjusted"`` - ``sex``: ``"men"``, ``"women"``, or ``"all_people"`` - ``claimants_000s``: Claimant count in thousands (float) - ``claimant_rate``: Claimant rate as percentage (float) :rtype: DataFrame with columns :raises NISRADataNotFoundError: If the Headline sheet is not found. .. rubric:: Example >>> df = parse_headline("/tmp/claimant_count.xlsx") >>> sorted(df["sex"].unique()) ['all_people', 'men', 'women'] >>> sorted(df["adjusted"].unique()) ['non_seasonally_adjusted', 'seasonally_adjusted'] .. py:function:: parse_age(file_path) Parse the Age sheet: NI claimant count by age band. Contains a single table of non-seasonally adjusted claimant counts broken down into three age bands: 16–24, 25–49, 50+. Data runs from January 2013. :param file_path: Path to the claimant count Excel file. :returns: - ``date``: pandas Timestamp (monthly, day=1) - ``age_group``: One of ``"16-24"``, ``"25-49"``, ``"50+"``. - ``claimants``: Claimant count (integer, rounded to nearest 5). :rtype: DataFrame with columns :raises NISRADataNotFoundError: If the Age sheet is not found. .. rubric:: Example >>> df = parse_age("/tmp/claimant_count.xlsx") >>> sorted(df["age_group"].unique()) ['16-24', '25-49', '50+'] .. py:function:: parse_geography(file_path, sheet) Parse a geographic breakdown sheet (LGD_11, PCA, or TTWA). Each sheet contains a current-month snapshot with columns for: male/female/total claimant numbers, working-age rates, month and year changes. :param file_path: Path to the claimant count Excel file. :param sheet: Sheet name — one of ``"LGD_11"``, ``"PCA"``, or ``"TTWA"``. :returns: - ``date``: pandas Timestamp (extracted from the Excel filename) - ``geography``: Area name (e.g., ``"Belfast"``) - ``geography_type``: Sheet type identifier (e.g., ``"LGD_11"``) - ``claimants_male``: Number of male claimants (int) - ``claimants_female``: Number of female claimants (int) - ``claimants_total``: Total claimants (int) - ``claimant_rate_male_pct``: Male working-age claimant rate (float) - ``claimant_rate_female_pct``: Female working-age claimant rate (float) - ``claimant_rate_total_pct``: Total working-age claimant rate (float) - ``change_over_month_number``: Change vs previous month (int) - ``change_over_year_number``: Change vs same month last year (int) :rtype: DataFrame with columns :raises NISRADataNotFoundError: If the requested sheet is not found. :raises ValueError: If sheet is not one of the supported values. .. rubric:: Example >>> df = parse_geography("/tmp/claimant_count.xlsx", "LGD_11") >>> len(df["geography"].unique()) >= 11 True >>> "claimants_total" in df.columns True .. py:function:: parse_soa(file_path) Parse the SOA sheet: Super Output Area time series. The SOA sheet is wide-format with 889 Super Output Areas as rows and monthly dates as columns from October 2017. This function melts it to long format. .. note:: There is a methodology break at January 2026 where geography codes transition from COA2011 to DZ2021. Both series are included in the output. :param file_path: Path to the claimant count Excel file. :returns: - ``soa_code``: Super Output Area code and name (e.g., ``"95AA01S1 : Aldergrove_1"``) - ``date``: pandas Timestamp (monthly, day=1) - ``claimants``: Claimant count (int, rounded to nearest 5) :rtype: DataFrame with columns :raises NISRADataNotFoundError: If the SOA sheet is not found. .. rubric:: Example >>> df = parse_soa("/tmp/claimant_count.xlsx") >>> "soa_code" in df.columns True >>> df["date"].min().year <= 2018 True .. py:function:: get_latest_claimant_count(breakdown = 'headline', force_refresh = False) Download and parse the latest NISRA claimant count data. Automatically discovers and downloads the most recent monthly publication, then returns the requested breakdown. :param breakdown: One of: - ``"headline"`` — NI total by sex, SA and non-SA (default) - ``"age"`` — NI total by age band (16–24, 25–49, 50+) - ``"lgd"`` — 11 Local Government Districts (current month) - ``"pca"`` — 18 Parliamentary Constituency Areas (current month) - ``"ttwa"`` — 10 Travel-to-Work Areas (current month) - ``"soa"`` — 889 Super Output Areas, long-format time series :param force_refresh: If ``True``, bypass cache and download fresh data. :returns: DataFrame for the requested breakdown. See individual ``parse_*`` functions for column documentation. :raises ValueError: If ``breakdown`` is not a supported value. :raises NISRADataNotFoundError: If the data cannot be downloaded. .. rubric:: Example >>> df = get_latest_claimant_count("headline") >>> "claimants_000s" in df.columns True >>> df_lgd = get_latest_claimant_count("lgd") >>> len(df_lgd) >= 11 True .. py:function:: validate_claimant_count(df, breakdown) Validate the integrity of a claimant count DataFrame. Checks that required columns are present, values are in plausible ranges, and the DataFrame is non-empty. :param df: DataFrame returned by ``get_latest_claimant_count`` or a ``parse_*`` function. :param breakdown: The breakdown type that produced the DataFrame. One of ``"headline"``, ``"age"``, ``"lgd"``, ``"pca"``, ``"ttwa"``, ``"soa"``. :returns: ``True`` if validation passes, ``False`` otherwise. .. rubric:: Example >>> import pandas as pd >>> validate_claimant_count(pd.DataFrame(), "headline") False