bolster.data_sources.education_suspensions ========================================== .. py:module:: bolster.data_sources.education_suspensions .. autoapi-nested-parse:: Pupil Suspensions and Expulsions in Northern Ireland. Provides access to annual suspension and expulsion statistics for pupils of compulsory school age in Northern Ireland, published by the Department of Education Northern Ireland (DE NI). Data covers pupil suspensions broken down by: - Trend over time (Table 1, from 2011/12 to present) - School type (Primary, Non Grammar, Grammar, Special) (Table 2) - School management type (Controlled, Voluntary, etc.) (Table 3) - Number of suspension occasions (Once, Twice, Three or more) (Table 4) - Suspension duration (Table 5) - Key Stage (Foundation/KS1, KS2, KS3, KS4) (Table 6) - Pupil characteristics (sex, age, ethnicity, SEN, religion) (Table 7) - Education Authority Region (Table 8) - Reason for suspension (Table 9) Data Source: **Stable Entry Point**: https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions The module scrapes the articles page to find the latest publication link, then scrapes the publication page for the Excel data file URL. Update Frequency: Annual Geographic Coverage: Northern Ireland Reference Period: 2011/12 – present .. rubric:: Example >>> from bolster.data_sources.education_suspensions import get_latest_suspensions >>> df = get_latest_suspensions() >>> 'academic_year' in df.columns True Attributes ---------- .. autoapisummary:: bolster.data_sources.education_suspensions.logger bolster.data_sources.education_suspensions.ARTICLES_URL bolster.data_sources.education_suspensions.BASE_URL bolster.data_sources.education_suspensions.CACHE_DIR Exceptions ---------- .. autoapisummary:: bolster.data_sources.education_suspensions.EducationSuspensionsError bolster.data_sources.education_suspensions.EducationSuspensionsNotFoundError bolster.data_sources.education_suspensions.EducationSuspensionsParseError Functions --------- .. autoapisummary:: bolster.data_sources.education_suspensions.get_suspensions_publication_url bolster.data_sources.education_suspensions.parse_suspensions_file bolster.data_sources.education_suspensions.parse_all_tables bolster.data_sources.education_suspensions.get_latest_suspensions bolster.data_sources.education_suspensions.validate_suspensions_data Module Contents --------------- .. py:data:: logger .. py:data:: ARTICLES_URL :value: 'https://www.education-ni.gov.uk/articles/pupil-suspensions-and-expulsions' .. py:data:: BASE_URL :value: 'https://www.education-ni.gov.uk' .. py:data:: CACHE_DIR .. py:exception:: EducationSuspensionsError Bases: :py:obj:`Exception` Base exception for education suspensions data errors. Initialize self. See help(type(self)) for accurate signature. .. py:exception:: EducationSuspensionsNotFoundError Bases: :py:obj:`EducationSuspensionsError` Raised when the data file or publication page cannot be found. Initialize self. See help(type(self)) for accurate signature. .. py:exception:: EducationSuspensionsParseError Bases: :py:obj:`EducationSuspensionsError` Raised when parsing the data file fails in an unexpected way. Initialize self. See help(type(self)) for accurate signature. .. py:function:: get_suspensions_publication_url() Scrape education-ni.gov.uk to find the latest suspensions XLSX URL. The function follows a two-step chain: 1. Fetches the stable articles page and extracts the most recent publications link for pupil suspensions/expulsions. 2. Fetches that publication page and extracts the XLSX download link. :returns: Absolute URL of the latest suspensions XLSX file. :raises EducationSuspensionsNotFoundError: If the publication or XLSX link cannot be found. .. py:function:: parse_suspensions_file(file_path) Parse the DE NI suspensions XLSX file into a tidy DataFrame. Reads Table 1 (annual trend) and produces one row per academic year with standardised column names. Table 1 is the only table with a multi-year time series; the remaining tables are single-year snapshots and are intentionally not included in the tidy output (use :func:`parse_all_tables` for those). :param file_path: Path to the downloaded XLSX file. :returns: - ``academic_year``: Academic year string, e.g. ``"2024/25"`` - ``pupils_suspended``: Number of pupils suspended (int) - ``pct_pupils_suspended``: Percentage of all pupils suspended (float, 0–1) :rtype: DataFrame with columns :raises EducationSuspensionsParseError: If the file cannot be parsed. .. py:function:: parse_all_tables(file_path) Parse all tables from the DE NI suspensions XLSX into a dict of DataFrames. Each table is lightly cleaned (blank leading column removed, empty rows/cols dropped) but otherwise returned in its natural structure. :param file_path: Path to the downloaded XLSX file. :returns: Dictionary mapping sheet names to DataFrames. :raises EducationSuspensionsParseError: If the file cannot be read. .. py:function:: get_latest_suspensions(force_refresh = False) Download and return the latest NI pupil suspensions time-series data. This is the main entry point. Returns Table 1 (annual trend) parsed into a tidy DataFrame via :func:`parse_suspensions_file`. :param force_refresh: If ``True``, bypass the local cache and re-download. :returns: DataFrame with columns ``academic_year``, ``pupils_suspended``, ``pct_pupils_suspended``. :raises EducationSuspensionsNotFoundError: If the source cannot be located. :raises EducationSuspensionsParseError: If the file cannot be parsed. .. rubric:: Example >>> df = get_latest_suspensions() >>> 'academic_year' in df.columns True .. py:function:: validate_suspensions_data(df) Validate the structure and basic integrity of a suspensions DataFrame. Checks performed: - Required columns present - At least one row of data - ``pupils_suspended`` values are non-negative - ``pct_pupils_suspended`` values are in [0, 1] :param df: DataFrame as returned by :func:`parse_suspensions_file` or :func:`get_latest_suspensions`. :returns: ``True`` if the data passes all checks. :raises ValueError: If any check fails, with a descriptive message.