bolster.data_sources.education_suspensions

Pupil Suspensions and Expulsions in Northern Ireland.

Provides access to annual suspension and expulsion statistics for pupils of compulsory school age in Northern Ireland, published by the Department of Education Northern Ireland (DE NI).

Data covers pupil suspensions broken down by: - Trend over time (Table 1, from 2011/12 to present) - School type (Primary, Non Grammar, Grammar, Special) (Table 2) - School management type (Controlled, Voluntary, etc.) (Table 3) - Number of suspension occasions (Once, Twice, Three or more) (Table 4) - Suspension duration (Table 5) - Key Stage (Foundation/KS1, KS2, KS3, KS4) (Table 6) - Pupil characteristics (sex, age, ethnicity, SEN, religion) (Table 7) - Education Authority Region (Table 8) - Reason for suspension (Table 9)

Data Source:

Stable Entry Point: https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions

The module scrapes the articles page to find the latest publication link, then scrapes the publication page for the Excel data file URL.

Update Frequency: Annual Geographic Coverage: Northern Ireland Reference Period: 2011/12 – present

Example

>>> from bolster.data_sources.education_suspensions import get_latest_suspensions
>>> df = get_latest_suspensions()
>>> 'academic_year' in df.columns
True

Attributes

logger

ARTICLES_URL

BASE_URL

CACHE_DIR

Exceptions

EducationSuspensionsError

Base exception for education suspensions data errors.

EducationSuspensionsNotFoundError

Raised when the data file or publication page cannot be found.

EducationSuspensionsParseError

Raised when parsing the data file fails in an unexpected way.

Functions

get_suspensions_publication_url()

Scrape education-ni.gov.uk to find the latest suspensions XLSX URL.

parse_suspensions_file(file_path)

Parse the DE NI suspensions XLSX file into a tidy DataFrame.

parse_all_tables(file_path)

Parse all tables from the DE NI suspensions XLSX into a dict of DataFrames.

get_latest_suspensions([force_refresh])

Download and return the latest NI pupil suspensions time-series data.

validate_suspensions_data(df)

Validate the structure and basic integrity of a suspensions DataFrame.

Module Contents

bolster.data_sources.education_suspensions.logger[source]
bolster.data_sources.education_suspensions.ARTICLES_URL = 'https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions'[source]
bolster.data_sources.education_suspensions.BASE_URL = 'https://www.education-ni.gov.uk'[source]
bolster.data_sources.education_suspensions.CACHE_DIR[source]
exception bolster.data_sources.education_suspensions.EducationSuspensionsError[source]

Bases: Exception

Base exception for education suspensions data errors.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.data_sources.education_suspensions.EducationSuspensionsNotFoundError[source]

Bases: EducationSuspensionsError

Raised when the data file or publication page cannot be found.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.data_sources.education_suspensions.EducationSuspensionsParseError[source]

Bases: EducationSuspensionsError

Raised when parsing the data file fails in an unexpected way.

Initialize self. See help(type(self)) for accurate signature.

bolster.data_sources.education_suspensions.get_suspensions_publication_url()[source]

Scrape education-ni.gov.uk to find the latest suspensions XLSX URL.

The function follows a two-step chain: 1. Fetches the stable articles page and extracts the most recent

publications link for pupil suspensions/expulsions.

  1. Fetches that publication page and extracts the XLSX download link.

Returns:

Absolute URL of the latest suspensions XLSX file.

Raises:

EducationSuspensionsNotFoundError – If the publication or XLSX link cannot be found.

Return type:

str

bolster.data_sources.education_suspensions.parse_suspensions_file(file_path)[source]

Parse the DE NI suspensions XLSX file into a tidy DataFrame.

Reads Table 1 (annual trend) and produces one row per academic year with standardised column names. Table 1 is the only table with a multi-year time series; the remaining tables are single-year snapshots and are intentionally not included in the tidy output (use parse_all_tables() for those).

Parameters:

file_path (str | pathlib.Path) – Path to the downloaded XLSX file.

Returns:

  • academic_year: Academic year string, e.g. "2024/25"

  • pupils_suspended: Number of pupils suspended (int)

  • pct_pupils_suspended: Percentage of all pupils suspended (float, 0–1)

Return type:

DataFrame with columns

Raises:

EducationSuspensionsParseError – If the file cannot be parsed.

bolster.data_sources.education_suspensions.parse_all_tables(file_path)[source]

Parse all tables from the DE NI suspensions XLSX into a dict of DataFrames.

Each table is lightly cleaned (blank leading column removed, empty rows/cols dropped) but otherwise returned in its natural structure.

Parameters:

file_path (str | pathlib.Path) – Path to the downloaded XLSX file.

Returns:

Dictionary mapping sheet names to DataFrames.

Raises:

EducationSuspensionsParseError – If the file cannot be read.

Return type:

dict[str, pandas.DataFrame]

bolster.data_sources.education_suspensions.get_latest_suspensions(force_refresh=False)[source]

Download and return the latest NI pupil suspensions time-series data.

This is the main entry point. Returns Table 1 (annual trend) parsed into a tidy DataFrame via parse_suspensions_file().

Parameters:

force_refresh (bool) – If True, bypass the local cache and re-download.

Returns:

DataFrame with columns academic_year, pupils_suspended, pct_pupils_suspended.

Raises:
Return type:

pandas.DataFrame

Example

>>> df = get_latest_suspensions()
>>> 'academic_year' in df.columns
True
bolster.data_sources.education_suspensions.validate_suspensions_data(df)[source]

Validate the structure and basic integrity of a suspensions DataFrame.

Checks performed: - Required columns present - At least one row of data - pupils_suspended values are non-negative - pct_pupils_suspended values are in [0, 1]

Parameters:

df (pandas.DataFrame) – DataFrame as returned by parse_suspensions_file() or get_latest_suspensions().

Returns:

True if the data passes all checks.

Raises:

ValueError – If any check fails, with a descriptive message.

Return type:

bool