bolster.data_sources.nisra.baby_names ===================================== .. py:module:: bolster.data_sources.nisra.baby_names .. autoapi-nested-parse:: NISRA Baby Names Northern Ireland Data Source. Provides access to baby name statistics for Northern Ireland from the Northern Ireland Statistics and Research Agency (NISRA), including: - Full historical list of all first forenames given to babies registered in NI (1997–present) - Annual rank and count for every name, by sex (Boys/Girls) The module uses the Full Name List file which contains all registered names with their rank and count for each year from 1997 to the most recent publication year. Data Source: **Statistics Page**: https://www.nisra.gov.uk/statistics/births/baby-names The statistics page lists all Baby Names publications in reverse chronological order (newest first). The module automatically scrapes this page to find the latest Baby Names publication, then downloads the Full Names List Excel file from that publication's detail page. The full names list files contain complete time series from 1997 to the most recent year, updated annually in April. Update Frequency: Annual (published April each year) Geographic Coverage: Northern Ireland (births registered in NI) .. rubric:: Example >>> from bolster.data_sources.nisra import baby_names >>> df = baby_names.get_baby_names() >>> sorted(df.columns.tolist()) ['count', 'name', 'rank', 'sex', 'year'] >>> sorted(df['sex'].unique().tolist()) ['Boys', 'Girls'] >>> df['year'].min() >= 1997 True Attributes ---------- .. autoapisummary:: bolster.data_sources.nisra.baby_names.logger bolster.data_sources.nisra.baby_names.BABY_NAMES_STATS_URL bolster.data_sources.nisra.baby_names.NISRA_BASE_URL Functions --------- .. autoapisummary:: bolster.data_sources.nisra.baby_names.get_baby_names_publication_url bolster.data_sources.nisra.baby_names.parse_baby_names_file bolster.data_sources.nisra.baby_names.get_baby_names bolster.data_sources.nisra.baby_names.validate_baby_names Module Contents --------------- .. py:data:: logger .. py:data:: BABY_NAMES_STATS_URL :value: 'https://www.nisra.gov.uk/statistics/births/baby-names' .. py:data:: NISRA_BASE_URL :value: 'https://www.nisra.gov.uk' .. py:function:: get_baby_names_publication_url() Scrape NISRA to find the latest Baby Names Full Name List Excel URL. Navigates the publication structure: 1. Scrapes the baby names statistics page for the latest publication link 2. Follows link to the publication detail page 3. Finds the Full Names List Excel file link :returns: URL of the latest Full Name List Excel file :raises NISRADataNotFoundError: If publication or file not found .. py:function:: parse_baby_names_file(file_path) Parse NISRA Full Name List Excel file into long-format DataFrame. The Full Name List Excel file contains two sheets: - Table 1: Boys' names (1997 to present), wide format with 3 columns per year (Name, Number of Babies, Rank), 29+ year blocks across the row - Table 2: Girls' names, same structure as Table 1 Names with suppressed counts (shown as '..') are excluded (names with fewer than 3 occurrences are suppressed for disclosure control). :param file_path: Path to the Full Name List Excel file :returns: - year: int — registration year - name: str — first forename (title case as registered) - sex: str — "Boys" or "Girls" - rank: int — rank within that sex and year (1 = most popular) - count: int — number of babies registered with that name :rtype: Long-format DataFrame with columns :raises NISRAValidationError: If the file structure is unexpected or no data parsed .. py:function:: get_baby_names(force_refresh = False) Get the full historical Baby Names series for Northern Ireland (1997–present). Automatically discovers and downloads the most recent Full Name List publication from the NISRA website, which contains the complete historical series from 1997. :param force_refresh: If True, bypass cache and download fresh data :returns: - year: int — registration year (1997–present) - name: str — first forename as registered - sex: str — "Boys" or "Girls" - rank: int — rank within sex and year (1 = most popular) - count: int — number of babies with that name :rtype: Long-format DataFrame with columns :raises NISRADataNotFoundError: If the latest publication cannot be found :raises NISRAValidationError: If the file structure is unexpected .. rubric:: Example >>> df = get_baby_names() >>> sorted(df.columns.tolist()) ['count', 'name', 'rank', 'sex', 'year'] >>> df['year'].min() >= 1997 True >>> sorted(df['sex'].unique().tolist()) ['Boys', 'Girls'] >>> df[df['year'] == df['year'].max()].nsmallest(1, 'rank')['name'].iloc[0] is not None True .. py:function:: validate_baby_names(df) Validate a baby names DataFrame for structural and data integrity. Checks: - Required columns are present - No null values in any column - Both sexes present ("Boys" and "Girls") - Year range starts at or before 1999 (data should go back to 1997) - Rank starts at 1 for at least one year/sex combination - All counts are positive (> 0) - No negative ranks or counts :param df: DataFrame to validate (from parse_baby_names_file or get_baby_names) :returns: True if validation passes :raises NISRAValidationError: If any validation check fails, with descriptive message .. rubric:: Example >>> import pandas as pd >>> valid_df = pd.DataFrame({ ... 'year': [2020, 2020], 'name': ['Noah', 'Jack'], ... 'sex': ['Boys', 'Boys'], 'rank': [1, 2], 'count': [100, 90] ... }) >>> validate_baby_names(valid_df) True