bolster.data_sources.nisra.population ===================================== .. py:module:: bolster.data_sources.nisra.population .. autoapi-nested-parse:: NISRA Mid-Year Population Estimates Data Source. Provides access to mid-year population estimates for Northern Ireland with breakdowns by: - Geography (Northern Ireland, Parliamentary Constituencies, Health and Social Care Trusts) - Sex (All persons, Males, Females) - Age (5-year age bands: 00-04, 05-09, ..., 85-89, 90+) - Year (1971-present for NI overall, 2021-present for sub-geographies) Mid-year estimates are referenced to June 30th of each year. Data Source: **Mother Page**: https://www.nisra.gov.uk/statistics/people-and-communities/population This page lists all population statistics publications in reverse chronological order (newest first). The module automatically scrapes this page to find the latest "Mid-Year Population Estimates for Small Geographical Areas" publication, then downloads the age bands Excel file from that publication's detail page. The files contain complete time series data in a pre-processed "Flat" format, making this one of the most analysis-ready NISRA datasets. Update Frequency: Annual (published ~6 months after reference date) Geographic Coverage: Northern Ireland Reference Date: June 30th of each year .. rubric:: Example >>> from bolster.data_sources.nisra import population >>> # Get latest population estimates for all geographies >>> df = population.get_latest_population() >>> 'population' in df.columns True >>> # Get only Northern Ireland overall >>> ni_df = population.get_latest_population(area='Northern Ireland') >>> len(ni_df) > 0 True Attributes ---------- .. autoapisummary:: bolster.data_sources.nisra.population.logger bolster.data_sources.nisra.population.POPULATION_BASE_URL Functions --------- .. autoapisummary:: bolster.data_sources.nisra.population.get_latest_population_publication_url bolster.data_sources.nisra.population.parse_population_file bolster.data_sources.nisra.population.get_latest_population bolster.data_sources.nisra.population.validate_population_totals bolster.data_sources.nisra.population.get_population_by_year bolster.data_sources.nisra.population.get_population_pyramid_data Module Contents --------------- .. py:data:: logger .. py:data:: POPULATION_BASE_URL :value: 'https://www.nisra.gov.uk/statistics/population/mid-year-population-estimates' .. py:function:: get_latest_population_publication_url() Scrape NISRA population mother page to find the latest MYE age bands file. Navigates the publication structure: 1. Scrapes mother page for latest "Mid-Year Population Estimates" publication 2. Follows link to publication detail page 3. Finds age bands Excel file :returns: Tuple of (excel_file_url, year) :raises NISRADataNotFoundError: If publication or file not found .. py:function:: parse_population_file(file_path, area = 'all') Parse NISRA mid-year population estimates Excel file. The population file contains a "Flat" sheet with pre-processed long-format data, making this one of the easiest NISRA datasets to work with. :param file_path: Path to the population Excel file :param area: Which geographic area(s) to return: - "all": All geographic breakdowns - "Northern Ireland": NI overall only (1971-present) - "Parliamentary Constituencies (2024)": 2024 constituencies (2021-present) - "Health and Social Care Trusts": HSC Trusts (2021-present) - "Parliamentary Constituencies (2008)": 2008 constituencies (2021-present) :returns: - area: str (e.g., "1. Northern Ireland") - area_code: str (ONS geography code) - area_name: str (full area name) - year: int (reference year) - sex: str ("All persons", "Males", "Females") - age_5: str (5-year age band: "00-04", "05-09", ..., "90+") - age_band: str (custom age band) - age_broad: str (broad age band: "00-15", "16-39", "40-64", "65+") - population: int (mid-year estimate) :rtype: DataFrame with columns :raises NISRAValidationError: If file structure is unexpected .. py:function:: get_latest_population(area = 'all', force_refresh = False) Get the latest mid-year population estimates. Automatically discovers and downloads the most recent population estimates from the NISRA website. :param area: Which geographic area(s) to return (default: "all") :param force_refresh: If True, bypass cache and download fresh data :returns: - area, area_code, area_name: Geographic identifiers - year: Reference year - sex: "All persons", "Males", or "Females" - age_5: 5-year age band - age_band, age_broad: Alternative age groupings - population: Mid-year estimate :rtype: DataFrame with columns :raises NISRADataNotFoundError: If latest publication cannot be found :raises NISRAValidationError: If file structure is unexpected .. rubric:: Example >>> df = get_latest_population() >>> 'population' in df.columns True >>> ni_df = get_latest_population(area='Northern Ireland') >>> sorted(df.columns.tolist()) ['age_5', 'age_band', 'age_broad', 'area', 'area_code', 'area_name', 'population', 'sex', 'year'] .. py:function:: validate_population_totals(df) Validate that Males + Females population equals All persons for each group. :param df: DataFrame from parse_population_file or get_latest_population :returns: True if validation passes :raises NISRAValidationError: If validation fails .. py:function:: get_population_by_year(df, year, sex = 'All persons') Filter population data for a specific year and optional sex. :param df: DataFrame from get_latest_population() :param year: Year to filter :param sex: Sex category to filter (default: "All persons") :returns: Filtered DataFrame .. rubric:: Example >>> df = get_latest_population(area='Northern Ireland') >>> pop_2024 = get_population_by_year(df, 2024) >>> total = pop_2024['population'].sum() >>> bool(total > 0) True .. py:function:: get_population_pyramid_data(df, year, area_name = 'NORTHERN IRELAND') Prepare data for population pyramid visualization. Returns males and females by age band for a specific year and area, formatted for easy pyramid plotting. :param df: DataFrame from get_latest_population() :param year: Year to visualize :param area_name: Area name to filter (default: "NORTHERN IRELAND") :returns: - age_5: Age band - males: Male population (positive values) - females: Female population (negative values for pyramid) :rtype: DataFrame with columns .. rubric:: Example >>> df = get_latest_population(area='Northern Ireland') >>> pyramid = get_population_pyramid_data(df, 2024) >>> sorted(pyramid.columns.tolist()) ['age_5', 'females', 'males']