bolster.data_sources.nisra.migration ==================================== .. py:module:: bolster.data_sources.nisra.migration .. autoapi-nested-parse:: NISRA Migration Estimates - Official and Derived. This module provides access to NISRA migration data through two approaches: 1. **Official Migration Statistics**: Published NISRA long-term international migration estimates from administrative data and the International Passenger Survey (IPS). 2. **Derived Migration Estimates**: Calculated from demographic components using the demographic accounting equation: Net Migration = Population Change - Natural Change Net Migration = ΔPopulation - (Births - Deaths) Both approaches are useful: - Official statistics are authoritative but published with a lag - Derived estimates can be calculated for more recent periods - Comparing both validates the demographic equation approach Data Sources: **Official Migration**: https://www.nisra.gov.uk/statistics/population/long-term-international-migration-statistics **Derived Migration** (combines three NISRA sources): - **Population**: https://www.nisra.gov.uk/statistics/people-and-communities/population - **Births**: https://www.nisra.gov.uk/statistics/births-deaths-and-marriages/births - **Deaths**: https://www.nisra.gov.uk/statistics/births-deaths-and-marriages/deaths Update Frequency: Annual (both official and derived) Geographic Coverage: Northern Ireland Reference Period: Mid-year (July to June) for official; Calendar year for derived .. rubric:: Example >>> from bolster.data_sources.nisra import migration >>> >>> # Get official NISRA migration statistics >>> official = migration.get_official_migration() >>> sorted(official.columns.tolist()) ['date', 'net_migration', 'year'] >>> # Get derived migration estimates (from demographic equation) >>> derived = migration.get_derived_migration() >>> 'net_migration' in derived.columns True >>> # Compare official vs derived for validation >>> comparison = migration.compare_official_vs_derived(official, derived) >>> 'absolute_difference' in comparison.columns True Attributes ---------- .. autoapisummary:: bolster.data_sources.nisra.migration.logger bolster.data_sources.nisra.migration.MIGRATION_MOTHER_PAGE bolster.data_sources.nisra.migration.get_derived_migration Functions --------- .. autoapisummary:: bolster.data_sources.nisra.migration.calculate_annual_births bolster.data_sources.nisra.migration.calculate_annual_deaths bolster.data_sources.nisra.migration.calculate_annual_population bolster.data_sources.nisra.migration.derive_migration bolster.data_sources.nisra.migration.get_latest_migration bolster.data_sources.nisra.migration.validate_demographic_equation bolster.data_sources.nisra.migration.get_migration_by_year bolster.data_sources.nisra.migration.get_migration_summary_statistics bolster.data_sources.nisra.migration.get_official_migration_publication_url bolster.data_sources.nisra.migration.parse_official_migration_file bolster.data_sources.nisra.migration.validate_official_migration bolster.data_sources.nisra.migration.get_official_migration bolster.data_sources.nisra.migration.compare_official_vs_derived Module Contents --------------- .. py:data:: logger .. py:function:: calculate_annual_births(births_df) Aggregate monthly births data to annual totals. :param births_df: DataFrame from births.get_latest_births(event_type='occurrence') :returns: - year: int - births: int (total births in year) :rtype: DataFrame with columns .. py:function:: calculate_annual_deaths(deaths_df) Aggregate weekly deaths data to annual totals. :param deaths_df: DataFrame from deaths.get_historical_deaths() :returns: - year: int - deaths: int (total deaths in year) :rtype: DataFrame with columns .. py:function:: calculate_annual_population(population_df) Aggregate population data to annual totals for Northern Ireland. :param population_df: DataFrame from population.get_latest_population(area='Northern Ireland') :returns: - year: int - population: int (mid-year population estimate) :rtype: DataFrame with columns .. py:function:: derive_migration(population_df, births_df, deaths_df) Derive net migration from demographic components. Uses the demographic accounting equation: Net Migration = ΔPopulation - (Births - Deaths) :param population_df: DataFrame from population.get_latest_population() :param births_df: DataFrame from births.get_latest_births(event_type='occurrence') :param deaths_df: DataFrame from deaths.get_latest_deaths() :returns: - year: int - population_start: int (population at start of year, June 30 t-1) - population_end: int (population at end of year, June 30 t) - births: int (births in calendar year) - deaths: int (deaths in calendar year) - natural_change: int (births - deaths) - population_change: int (population_end - population_start) - net_migration: int (derived migration estimate) - migration_rate: float (per 1,000 population) :rtype: DataFrame with columns :raises NISRAValidationError: If data sources cannot be aligned .. py:function:: get_latest_migration(force_refresh = False) Get the latest derived migration estimates for Northern Ireland. Automatically downloads the most recent population, births, and deaths data, then calculates net migration using the demographic accounting equation. :param force_refresh: If True, bypass cache and download fresh data for all sources :returns: - year: int - population_start, population_end: int (mid-year estimates) - births, deaths: int (annual totals) - natural_change: int (births - deaths) - population_change: int (year-over-year change) - net_migration: int (derived estimate) - migration_rate: float (per 1,000 population) :rtype: DataFrame with columns .. rubric:: Example >>> df = get_latest_migration() >>> 'net_migration' in df.columns True >>> len(df) > 0 True .. py:function:: validate_demographic_equation(df, tolerance = 100) Validate that the demographic accounting equation holds. Checks that: Population Change = Natural Change + Net Migration :param df: DataFrame from derive_migration() or get_latest_migration() :param tolerance: Allowable difference due to rounding/measurement error (default: 100) :returns: True if validation passes :raises NISRAValidationError: If equation doesn't hold within tolerance .. py:function:: get_migration_by_year(df, year) Filter migration data for a specific year. :param df: DataFrame from get_latest_migration() :param year: Year to filter :returns: Filtered DataFrame .. rubric:: Example >>> df = get_latest_migration() >>> df_2024 = get_migration_by_year(df, 2024) >>> 'net_migration' in df_2024.columns True .. py:function:: get_migration_summary_statistics(df, start_year = None, end_year = None) Calculate summary statistics for migration data. :param df: DataFrame from get_latest_migration() :param start_year: Optional start year for analysis period :param end_year: Optional end year for analysis period :returns: - total_years: Number of years analyzed - avg_net_migration: Average annual net migration - avg_migration_rate: Average migration rate per 1,000 - positive_years: Number of years with net immigration - negative_years: Number of years with net emigration - max_immigration_year: Year with highest immigration - max_immigration: Highest immigration value - max_emigration_year: Year with highest emigration - max_emigration: Highest emigration value (as negative) :rtype: Dictionary with summary statistics .. rubric:: Example >>> df = get_latest_migration() >>> stats = get_migration_summary_statistics(df, start_year=2010) >>> 'avg_net_migration' in stats True .. py:data:: MIGRATION_MOTHER_PAGE :value: 'https://www.nisra.gov.uk/statistics/population/long-term-international-migration-statistics' .. py:function:: get_official_migration_publication_url() Scrape NISRA migration mother page to find latest Official estimates file. Navigates the publication structure: 1. Scrapes mother page for latest "Long-Term International Migration" publication 2. Follows link to publication detail page 3. Finds "Official" Excel file (Mig[YY][YY]-Official_1.xlsx) :returns: Tuple of (excel_file_url, publication_year) :raises NISRADataNotFoundError: If publication or file not found .. py:function:: parse_official_migration_file(file_path) Parse downloaded official migration Excel file into DataFrame. Extracts Table 1.1 (Net International Migration time series) from the Official estimates file and transforms it into long-format DataFrame. :param file_path: Path to downloaded Mig[YY][YY]-Official_1.xlsx file :returns: - year: int (mid-year) - net_migration: int (net international migration) - date: pd.Timestamp (reference date, June 30 of end year) :rtype: DataFrame with columns :raises NISRAValidationError: If file format is unexpected or parsing fails .. py:function:: validate_official_migration(df) Validate official migration data quality. :param df: DataFrame from parse_official_migration_file() or get_official_migration() :returns: True if validation passes :raises NISRAValidationError: If validation fails .. py:function:: get_official_migration(force_refresh = False) Get the latest official NISRA migration statistics. Automatically downloads the most recent official migration estimates from NISRA and parses them into a structured DataFrame. :param force_refresh: If True, bypass cache and download fresh data :returns: - year: int (mid-year) - net_migration: int (net international migration) - date: pd.Timestamp (reference date) :rtype: DataFrame with columns :raises NISRADataNotFoundError: If publication cannot be found :raises NISRAValidationError: If data fails validation .. rubric:: Example >>> official = get_official_migration() >>> sorted(official.columns.tolist()) ['date', 'net_migration', 'year'] >>> len(official) > 0 True .. py:data:: get_derived_migration .. py:function:: compare_official_vs_derived(official_df, derived_df, threshold = 1000) Compare official migration data with derived estimates for validation. :param official_df: DataFrame from get_official_migration() :param derived_df: DataFrame from get_derived_migration() / get_latest_migration() :param threshold: Absolute difference threshold for flagging discrepancies (default: 1000) :returns: - year: int - official_net_migration: int - derived_net_migration: int - absolute_difference: int - percent_difference: float - exceeds_threshold: bool :rtype: DataFrame with columns .. rubric:: Example >>> official = get_official_migration() >>> derived = get_derived_migration() >>> comparison = compare_official_vs_derived(official, derived) >>> sorted(comparison.columns.tolist()) ['absolute_difference', 'derived_net_migration', 'exceeds_threshold', 'official_net_migration', 'percent_difference', 'year']