bolster.data_sources.nisra.baby_names

NISRA Baby Names Northern Ireland Data Source.

Provides access to baby name statistics for Northern Ireland from the Northern Ireland Statistics and Research Agency (NISRA), including: - Full historical list of all first forenames given to babies registered in NI (1997–present) - Annual rank and count for every name, by sex (Boys/Girls)

The module uses the Full Name List file which contains all registered names with their rank and count for each year from 1997 to the most recent publication year.

Data Source:

Statistics Page: https://www.nisra.gov.uk/statistics/births/baby-names

The statistics page lists all Baby Names publications in reverse chronological order (newest first). The module automatically scrapes this page to find the latest Baby Names publication, then downloads the Full Names List Excel file from that publication’s detail page.

The full names list files contain complete time series from 1997 to the most recent year, updated annually in April.

Update Frequency: Annual (published April each year) Geographic Coverage: Northern Ireland (births registered in NI)

Example

>>> from bolster.data_sources.nisra import baby_names
>>> df = baby_names.get_baby_names()
>>> sorted(df.columns.tolist())
['count', 'name', 'rank', 'sex', 'year']
>>> sorted(df['sex'].unique().tolist())
['Boys', 'Girls']
>>> df['year'].min() >= 1997
True

Attributes

logger

BABY_NAMES_STATS_URL

NISRA_BASE_URL

Functions

get_baby_names_publication_url()

Scrape NISRA to find the latest Baby Names Full Name List Excel URL.

parse_baby_names_file(file_path)

Parse NISRA Full Name List Excel file into long-format DataFrame.

get_baby_names([force_refresh])

Get the full historical Baby Names series for Northern Ireland (1997–present).

validate_baby_names(df)

Validate a baby names DataFrame for structural and data integrity.

Module Contents

bolster.data_sources.nisra.baby_names.logger[source]
bolster.data_sources.nisra.baby_names.BABY_NAMES_STATS_URL = 'https://www.nisra.gov.uk/statistics/births/baby-names'[source]
bolster.data_sources.nisra.baby_names.NISRA_BASE_URL = 'https://www.nisra.gov.uk'[source]
bolster.data_sources.nisra.baby_names.get_baby_names_publication_url()[source]

Scrape NISRA to find the latest Baby Names Full Name List Excel URL.

Navigates the publication structure: 1. Scrapes the baby names statistics page for the latest publication link 2. Follows link to the publication detail page 3. Finds the Full Names List Excel file link

Returns:

URL of the latest Full Name List Excel file

Raises:

NISRADataNotFoundError – If publication or file not found

Return type:

str

bolster.data_sources.nisra.baby_names.parse_baby_names_file(file_path)[source]

Parse NISRA Full Name List Excel file into long-format DataFrame.

The Full Name List Excel file contains two sheets: - Table 1: Boys’ names (1997 to present), wide format with 3 columns per year

(Name, Number of Babies, Rank), 29+ year blocks across the row

  • Table 2: Girls’ names, same structure as Table 1

Names with suppressed counts (shown as ‘..’) are excluded (names with fewer than 3 occurrences are suppressed for disclosure control).

Parameters:

file_path (str | pathlib.Path) – Path to the Full Name List Excel file

Returns:

  • year: int — registration year

  • name: str — first forename (title case as registered)

  • sex: str — “Boys” or “Girls”

  • rank: int — rank within that sex and year (1 = most popular)

  • count: int — number of babies registered with that name

Return type:

Long-format DataFrame with columns

Raises:

NISRAValidationError – If the file structure is unexpected or no data parsed

bolster.data_sources.nisra.baby_names.get_baby_names(force_refresh=False)[source]

Get the full historical Baby Names series for Northern Ireland (1997–present).

Automatically discovers and downloads the most recent Full Name List publication from the NISRA website, which contains the complete historical series from 1997.

Parameters:

force_refresh (bool) – If True, bypass cache and download fresh data

Returns:

  • year: int — registration year (1997–present)

  • name: str — first forename as registered

  • sex: str — “Boys” or “Girls”

  • rank: int — rank within sex and year (1 = most popular)

  • count: int — number of babies with that name

Return type:

Long-format DataFrame with columns

Raises:
  • NISRADataNotFoundError – If the latest publication cannot be found

  • NISRAValidationError – If the file structure is unexpected

Example

>>> df = get_baby_names()
>>> sorted(df.columns.tolist())
['count', 'name', 'rank', 'sex', 'year']
>>> df['year'].min() >= 1997
True
>>> sorted(df['sex'].unique().tolist())
['Boys', 'Girls']
>>> df[df['year'] == df['year'].max()].nsmallest(1, 'rank')['name'].iloc[0] is not None
True
bolster.data_sources.nisra.baby_names.validate_baby_names(df)[source]

Validate a baby names DataFrame for structural and data integrity.

Checks: - Required columns are present - No null values in any column - Both sexes present (“Boys” and “Girls”) - Year range starts at or before 1999 (data should go back to 1997) - Rank starts at 1 for at least one year/sex combination - All counts are positive (> 0) - No negative ranks or counts

Parameters:

df (pandas.DataFrame) – DataFrame to validate (from parse_baby_names_file or get_baby_names)

Returns:

True if validation passes

Raises:

NISRAValidationError – If any validation check fails, with descriptive message

Return type:

bool

Example

>>> import pandas as pd
>>> valid_df = pd.DataFrame({
...     'year': [2020, 2020], 'name': ['Noah', 'Jack'],
...     'sex': ['Boys', 'Boys'], 'rank': [1, 2], 'count': [100, 90]
... })
>>> validate_baby_names(valid_df)
True