bolster.data_sources.education_suspensions
Pupil Suspensions and Expulsions in Northern Ireland.
Provides access to annual suspension and expulsion statistics for pupils of compulsory school age in Northern Ireland, published by the Department of Education Northern Ireland (DE NI).
Data covers pupil suspensions broken down by: - Trend over time (Table 1, from 2011/12 to present) - School type (Primary, Non Grammar, Grammar, Special) (Table 2) - School management type (Controlled, Voluntary, etc.) (Table 3) - Number of suspension occasions (Once, Twice, Three or more) (Table 4) - Suspension duration (Table 5) - Key Stage (Foundation/KS1, KS2, KS3, KS4) (Table 6) - Pupil characteristics (sex, age, ethnicity, SEN, religion) (Table 7) - Education Authority Region (Table 8) - Reason for suspension (Table 9)
- Data Source:
Stable Entry Point: https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions
The module scrapes the articles page to find the latest publication link, then scrapes the publication page for the Excel data file URL.
Update Frequency: Annual Geographic Coverage: Northern Ireland Reference Period: 2011/12 – present
Example
>>> from bolster.data_sources.education_suspensions import get_latest_suspensions
>>> df = get_latest_suspensions()
>>> 'academic_year' in df.columns
True
Attributes
Exceptions
Base exception for education suspensions data errors. |
|
Raised when the data file or publication page cannot be found. |
|
Raised when parsing the data file fails in an unexpected way. |
Functions
Scrape education-ni.gov.uk to find the latest suspensions XLSX URL. |
|
|
Parse the DE NI suspensions XLSX file into a tidy DataFrame. |
|
Parse all tables from the DE NI suspensions XLSX into a dict of DataFrames. |
|
Download and return the latest NI pupil suspensions time-series data. |
Validate the structure and basic integrity of a suspensions DataFrame. |
Module Contents
- bolster.data_sources.education_suspensions.ARTICLES_URL = 'https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions'[source]
- exception bolster.data_sources.education_suspensions.EducationSuspensionsError[source]
Bases:
ExceptionBase exception for education suspensions data errors.
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.data_sources.education_suspensions.EducationSuspensionsNotFoundError[source]
Bases:
EducationSuspensionsErrorRaised when the data file or publication page cannot be found.
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.data_sources.education_suspensions.EducationSuspensionsParseError[source]
Bases:
EducationSuspensionsErrorRaised when parsing the data file fails in an unexpected way.
Initialize self. See help(type(self)) for accurate signature.
- bolster.data_sources.education_suspensions.get_suspensions_publication_url()[source]
Scrape education-ni.gov.uk to find the latest suspensions XLSX URL.
The function follows a two-step chain: 1. Fetches the stable articles page and extracts the most recent
publications link for pupil suspensions/expulsions.
Fetches that publication page and extracts the XLSX download link.
- Returns:
Absolute URL of the latest suspensions XLSX file.
- Raises:
EducationSuspensionsNotFoundError – If the publication or XLSX link cannot be found.
- Return type:
- bolster.data_sources.education_suspensions.parse_suspensions_file(file_path)[source]
Parse the DE NI suspensions XLSX file into a tidy DataFrame.
Reads Table 1 (annual trend) and produces one row per academic year with standardised column names. Table 1 is the only table with a multi-year time series; the remaining tables are single-year snapshots and are intentionally not included in the tidy output (use
parse_all_tables()for those).- Parameters:
file_path (str | pathlib.Path) – Path to the downloaded XLSX file.
- Returns:
academic_year: Academic year string, e.g."2024/25"pupils_suspended: Number of pupils suspended (int)pct_pupils_suspended: Percentage of all pupils suspended (float, 0–1)
- Return type:
DataFrame with columns
- Raises:
EducationSuspensionsParseError – If the file cannot be parsed.
- bolster.data_sources.education_suspensions.parse_all_tables(file_path)[source]
Parse all tables from the DE NI suspensions XLSX into a dict of DataFrames.
Each table is lightly cleaned (blank leading column removed, empty rows/cols dropped) but otherwise returned in its natural structure.
- Parameters:
file_path (str | pathlib.Path) – Path to the downloaded XLSX file.
- Returns:
Dictionary mapping sheet names to DataFrames.
- Raises:
EducationSuspensionsParseError – If the file cannot be read.
- Return type:
- bolster.data_sources.education_suspensions.get_latest_suspensions(force_refresh=False)[source]
Download and return the latest NI pupil suspensions time-series data.
This is the main entry point. Returns Table 1 (annual trend) parsed into a tidy DataFrame via
parse_suspensions_file().- Parameters:
force_refresh (bool) – If
True, bypass the local cache and re-download.- Returns:
DataFrame with columns
academic_year,pupils_suspended,pct_pupils_suspended.- Raises:
EducationSuspensionsNotFoundError – If the source cannot be located.
EducationSuspensionsParseError – If the file cannot be parsed.
- Return type:
Example
>>> df = get_latest_suspensions() >>> 'academic_year' in df.columns True
- bolster.data_sources.education_suspensions.validate_suspensions_data(df)[source]
Validate the structure and basic integrity of a suspensions DataFrame.
Checks performed: - Required columns present - At least one row of data -
pupils_suspendedvalues are non-negative -pct_pupils_suspendedvalues are in [0, 1]- Parameters:
df (pandas.DataFrame) – DataFrame as returned by
parse_suspensions_file()orget_latest_suspensions().- Returns:
Trueif the data passes all checks.- Raises:
ValueError – If any check fails, with a descriptive message.
- Return type: