bolster.data_sources.psni
PSNI (Police Service of Northern Ireland) Data Sources.
This module provides access to PSNI open data including: - Crime Statistics: Police recorded crime data with monthly updates - Road Traffic Collisions: Injury collision, casualty, and vehicle data - Police Ombudsman: Complaint statistics from 2000/01 to present - PACE Statistics: Annual stop & search and arrests under the Police and
Criminal Evidence (PACE) Order
Data is sourced from OpenDataNI and the Police Ombudsman’s Office under the Open Government Licence v3.0. Geographic breakdowns use the 11 Policing Districts which align with Northern Ireland’s Local Government Districts (LGDs), enabling integration with other NISRA datasets.
Example
>>> from bolster.data_sources.psni import crime_statistics, road_traffic_collisions
>>> df = crime_statistics.get_historical_crime_statistics()
>>> 'lgd_code' in df.columns
True
>>> lgd_code = crime_statistics.get_lgd_code("Belfast City")
>>> lgd_code
'N09000003'
>>> casualties = road_traffic_collisions.get_casualties()
>>> 'severity' in casualties.columns
True
See individual module docstrings for detailed documentation.
Submodules
Exceptions
Base exception for PSNI data errors. |
|
Raised when a PSNI data file cannot be downloaded or accessed. |
|
Raised when a PSNI data source is known to be stale with no accessible update. |
|
Raised when PSNI data fails validation checks. |
Functions
|
Clear cached files from the PSNI cache directory. |
|
Filter crime statistics to specific crime type(s). |
|
Filter crime statistics to a date range. |
|
Filter crime statistics to specific policing district(s). |
Get list of all crime types in the dataset. |
|
Get list of all policing districts in the dataset. |
|
|
Get monthly crime trends for a specific crime type and district. |
Get information about crime statistics data sources. |
|
|
Get historical police recorded crime statistics (April 2001 – December 2021). |
|
Raises PSNIDataStaleError — use get_historical_crime_statistics() instead. |
|
Get LGD code for a policing district. |
|
Get NUTS3 regional code for a policing district. |
|
Get descriptive name for a NUTS3 region code. |
|
Calculate crime outcome rates by policing district. |
|
Calculate total recorded crimes by policing district. |
|
Parse PSNI crime statistics CSV file. |
Validate crime statistics data integrity. |
|
Scrape the complaint-statistics page for the latest .xlsx download link. |
|
|
Download and return the latest Police Ombudsman complaint data. |
Scrape the quarterly-reports page for the latest .xlsx download link. |
|
|
Parse the annual Police Ombudsman statistics Excel workbook. |
|
Parse a quarterly Police Ombudsman statistics Excel workbook. |
|
Validate a Police Ombudsman complaints DataFrame. |
|
Get annual summary statistics across multiple years. |
Get list of years with available RTC data. |
|
|
Get casualty records for a specific year. |
|
Get casualty counts by policing district. |
|
Get casualty counts by road user type. |
|
Get casualty records merged with collision details. |
|
Get collision records for a specific year. |
|
Get vehicle records for a specific year. |
|
Validate RTC data integrity. |
Package Contents
- exception bolster.data_sources.psni.PSNIDataError[source]
Bases:
ExceptionBase exception for PSNI data errors.
All PSNI-specific exceptions inherit from this class, allowing callers to catch all PSNI errors with a single except clause.
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.data_sources.psni.PSNIDataNotFoundError[source]
Bases:
PSNIDataErrorRaised when a PSNI data file cannot be downloaded or accessed.
This exception is raised when: - Network requests fail (timeout, connection errors) - HTTP errors occur (404, 500, etc.) - The requested resource is unavailable
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.data_sources.psni.PSNIDataStaleError[source]
Bases:
PSNIDataErrorRaised when a PSNI data source is known to be stale with no accessible update.
This exception is raised when the underlying data source has not been updated and no machine-readable replacement is accessible (e.g. due to Cloudflare protection on the official PSNI website blocking automated downloads).
Initialize self. See help(type(self)) for accurate signature.
- exception bolster.data_sources.psni.PSNIValidationError[source]
Bases:
PSNIDataErrorRaised when PSNI data fails validation checks.
This exception is raised when: - CSV structure doesn’t match expected columns - Data contains invalid or unexpected values - Required fields are missing or malformed
Initialize self. See help(type(self)) for accurate signature.
- bolster.data_sources.psni.clear_cache(pattern=None)[source]
Clear cached files from the PSNI cache directory.
- Parameters:
pattern (str | None) – Optional glob pattern to match specific files (e.g.,
*.csv). If None, clears all cached files in the directory.- Returns:
Number of files deleted
- Return type:
Example
>>> from bolster.data_sources.psni._base import clear_cache >>> deleted = clear_cache("*.csv") >>> isinstance(deleted, int) True
- bolster.data_sources.psni.filter_by_crime_type(df, crime_type)[source]
Filter crime statistics to specific crime type(s).
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
crime_type (str | list[str]) – Crime type(s) to filter (e.g., “Burglary” or [“Violence with injury”, “Robbery”])
- Returns:
Filtered DataFrame
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> violence = filter_by_crime_type(df, "Violence with injury (including homicide & death/serious injury by unlawful driving)") >>> len(violence) > 0 True
- bolster.data_sources.psni.filter_by_date_range(df, start_date=None, end_date=None)[source]
Filter crime statistics to a date range.
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
start_date (str | datetime.datetime | None) – Start date (inclusive), e.g., “2020-01-01” or datetime
end_date (str | datetime.datetime | None) – End date (inclusive), e.g., “2021-12-31” or datetime
- Returns:
Filtered DataFrame
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> # Get 2020 data >>> df_2020 = filter_by_date_range(df, "2020-01-01", "2020-12-31") >>> df_2020['calendar_year'].unique().tolist() [2020] >>> >>> # Get data from 2018 onwards >>> recent = filter_by_date_range(df, start_date="2018-01-01") >>> len(recent) > 0 True
- bolster.data_sources.psni.filter_by_district(df, district)[source]
Filter crime statistics to specific policing district(s).
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
district (str | list[str]) – District name(s) to filter (e.g., “Belfast City” or [“Belfast City”, “Derry City & Strabane”])
- Returns:
Filtered DataFrame
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> belfast = filter_by_district(df, "Belfast City") >>> belfast['policing_district'].unique().tolist() ['Belfast City'] >>> >>> # Multiple districts >>> cities = filter_by_district(df, ["Belfast City", "Derry City & Strabane"]) >>> len(cities['policing_district'].unique()) == 2 True
- bolster.data_sources.psni.get_available_crime_types(df)[source]
Get list of all crime types in the dataset.
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
- Returns:
Sorted list of crime type names
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> crime_types = get_available_crime_types(df) >>> isinstance(crime_types, list) True >>> 'Total police recorded crime' in crime_types True
- bolster.data_sources.psni.get_available_districts(df)[source]
Get list of all policing districts in the dataset.
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
- Returns:
Sorted list of district names
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> districts = get_available_districts(df) >>> isinstance(districts, list) True >>> 'Northern Ireland' in districts True
- bolster.data_sources.psni.get_crime_trends(df, crime_type='Total police recorded crime', district='Northern Ireland', measure='Police Recorded Crime')[source]
Get monthly crime trends for a specific crime type and district.
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
crime_type (str) – Crime type to analyze (default: total crimes)
district (str) – Policing district (default: Northern Ireland total)
measure (str) – Data measure to use (default: Police Recorded Crime)
- Returns:
date, calendar_year, month, count
- Return type:
DataFrame with columns
Example
>>> df = get_latest_crime_statistics() >>> trends = get_crime_trends(df, district="Belfast City") >>> sorted(trends.columns.tolist()) ['calendar_year', 'count', 'date', 'month'] >>> len(trends) > 0 True
- bolster.data_sources.psni.get_data_source_info()[source]
Get information about crime statistics data sources.
Returns a dictionary with URLs and contact information for accessing PSNI crime statistics. Use this when you need data beyond December 2021.
- Returns:
opendatani_url: OpenDataNI dataset URL (data through Dec 2021)
data_guide_url: PDF data guide URL
psni_official_url: PSNI official statistics page (current data)
contact_email: PSNI Statistics Branch email
data_limitation: Description of OpenDataNI data limitations
last_update: Last known update date for OpenDataNI
- Return type:
Dictionary with keys
Example
>>> info = get_data_source_info() >>> sorted(info.keys()) ['contact_email', 'data_guide_url', 'data_limitation', 'last_update', 'opendatani_url', 'psni_official_url']
- bolster.data_sources.psni.get_historical_crime_statistics(force_refresh=False, add_geographic_codes=True)[source]
Get historical police recorded crime statistics (April 2001 – December 2021).
Downloads the crime statistics CSV from OpenDataNI. This dataset covers April 2001 through December 2021 and has not been updated since January 2022. For 2022+ data, consult PSNI directly.
- Parameters:
- Returns:
date, calendar_year, month, policing_district, crime_type, data_measure, count, lgd_code, nuts3_code, nuts3_name
- Return type:
DataFrame with columns
- Raises:
PSNIDataNotFoundError – If download fails
PSNIValidationError – If file structure is unexpected
Example
>>> df = get_historical_crime_statistics() >>> sorted(df.columns.tolist()) ['calendar_year', 'count', 'crime_type', 'data_measure', 'date', 'lgd_code', 'month', 'nuts3_code', 'nuts3_name', 'policing_district'] >>> df['date'].max().year 2021
- bolster.data_sources.psni.get_latest_crime_statistics(force_refresh=False, add_geographic_codes=True)[source]
Raises PSNIDataStaleError — use get_historical_crime_statistics() instead.
The OpenDataNI source was last updated January 2022. PSNI’s official site publishes current data but is Cloudflare-protected and inaccessible to automated downloads. Use
get_historical_crime_statistics()to access the data available (Apr 2001–Dec 2021).- Raises:
PSNIDataStaleError – Always — this data source has no accessible update.
- bolster.data_sources.psni.get_lgd_code(district_name)[source]
Get LGD code for a policing district.
- Parameters:
district_name (str) – Policing district name (e.g., “Belfast City”)
- Returns:
LGD code (e.g., “N09000003”) or None if not found
- Return type:
str | None
Example
>>> get_lgd_code("Belfast City") 'N09000003'
- bolster.data_sources.psni.get_nuts3_code(district_name)[source]
Get NUTS3 regional code for a policing district.
Uses NUTS 2021 classification where each LGD maps 1:1 to a NUTS3 region.
- Parameters:
district_name (str) – Policing district name (e.g., “Belfast City”)
- Returns:
NUTS3 code (e.g., “UKN06”) or None if not found
- Return type:
str | None
Example
>>> get_nuts3_code("Belfast City") 'UKN06' >>> get_nuts3_code("Derry City & Strabane") 'UKN0A'
- bolster.data_sources.psni.get_nuts_region_name(nuts3_code)[source]
Get descriptive name for a NUTS3 region code.
- Parameters:
nuts3_code (str) – NUTS3 code (e.g., “UKN06”)
- Returns:
Region name (e.g., “Belfast”) or None if not found
- Return type:
str | None
Example
>>> get_nuts_region_name("UKN06") 'Belfast' >>> get_nuts_region_name("UKN0A") 'Derry City and Strabane'
- bolster.data_sources.psni.get_outcome_rates_by_district(df, year=None, crime_type='Total police recorded crime')[source]
Calculate crime outcome rates by policing district.
Outcome rate represents the percentage of crimes with an outcome (charge, caution, community resolution, etc.)
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
year (int | None) – Optional year to filter (uses all years if None)
crime_type (str) – Crime type to analyze (default: total crimes)
- Returns:
policing_district, lgd_code, average_outcome_rate
- Return type:
DataFrame with columns
Example
>>> df = get_latest_crime_statistics() >>> outcomes = get_outcome_rates_by_district(df, year=2021) >>> 'average_outcome_rate' in outcomes.columns True
- bolster.data_sources.psni.get_total_crimes_by_district(df, year=None)[source]
Calculate total recorded crimes by policing district.
- Parameters:
df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics
year (int | None) – Optional year to filter (uses all years if None)
- Returns:
policing_district, lgd_code, nuts3_code, total_crimes
- Return type:
DataFrame with columns
Example
>>> df = get_latest_crime_statistics() >>> totals_2021 = get_total_crimes_by_district(df, year=2021) >>> sorted(totals_2021.columns.tolist()) ['lgd_code', 'nuts3_code', 'policing_district', 'total_crimes']
- bolster.data_sources.psni.parse_crime_statistics_file(file_path, add_geographic_codes=True)[source]
Parse PSNI crime statistics CSV file.
The file is in long format with columns for year, month, district, crime type, data measure, and count. This function reads the CSV, cleans column names, adds date parsing, and optionally adds LGD and NUTS3 geographic codes for cross-dataset integration.
- Parameters:
file_path (str | pathlib.Path) – Path to the crime statistics CSV file
add_geographic_codes (bool) – If True, add LGD and NUTS3 code columns
- Returns:
calendar_year: int (year of crime)
month: str (month name: Apr, May, …, Dec)
policing_district: str (district name or “Northern Ireland”)
crime_type: str (Home Office crime classification)
data_measure: str (type of measure - crime count, outcome number, outcome rate)
count: float (value - can be count or percentage)
date: datetime (first day of month)
lgd_code: str (ONS LGD code, if add_geographic_codes=True)
nuts3_code: str (NUTS3 region code, if add_geographic_codes=True)
nuts3_name: str (NUTS3 region name, if add_geographic_codes=True)
- Return type:
DataFrame with columns
- Raises:
PSNIValidationError – If file structure is unexpected
Example
>>> path = download_file(CRIME_STATISTICS_URL, cache_ttl_hours=24*7) >>> df = parse_crime_statistics_file(path) >>> 'crime_type' in df.columns True >>> len(df) > 0 True
- bolster.data_sources.psni.validate_crime_statistics(df)[source]
Validate crime statistics data integrity.
Performs sanity checks on the crime statistics data: - Non-negative crime counts - Reasonable date ranges - Expected policing districts present - No unexpected missing data
- Parameters:
df (pandas.DataFrame) – DataFrame from parse_crime_statistics_file or get_latest_crime_statistics
- Returns:
True if validation passes
- Raises:
PSNIValidationError – If validation fails
- Return type:
Example
>>> df = get_latest_crime_statistics() >>> validate_crime_statistics(df) True
- bolster.data_sources.psni.get_annual_publication_url()[source]
Scrape the complaint-statistics page for the latest .xlsx download link.
policeombudsman.org returns 403 to default User-Agents; this function uses a browser-like UA via
bolster.utils.web.session.- Returns:
Absolute URL of the latest annual Excel spreadsheet.
- Raises:
PSNIDataNotFoundError – If the page cannot be retrieved or no .xlsx link is found.
- Return type:
Example
>>> url = get_annual_publication_url() >>> url.startswith("https://") True
- bolster.data_sources.psni.get_latest_complaints(breakdown='totals', force_refresh=False)[source]
Download and return the latest Police Ombudsman complaint data.
For
totals,by_district,by_allegation_type, andby_outcomethe annual publication is used (richest historical coverage). Forquarterlythe latest quarterly bulletin is used.- Parameters:
breakdown (str) –
One of:
"totals"— total complaints 2000/01 to present (default)"by_district"— complaints by policing district, 2011/12+"by_allegation_type"— allegations by type, 2011/12+"by_outcome"— closures by outcome, 2011/12+"quarterly"— quarterly complaints, latest 5 financial years
force_refresh (bool) – If
True, bypass cache and re-download the source file.
- Returns:
Tidy DataFrame for the requested breakdown.
- Raises:
ValueError – If breakdown is not one of the recognised values.
PSNIDataNotFoundError – If the source cannot be downloaded.
- Return type:
Example
>>> df = get_latest_complaints() >>> set(["year", "complaints"]).issubset(df.columns) True >>> df_d = get_latest_complaints("by_district") >>> "district" in df_d.columns True
- bolster.data_sources.psni.get_quarterly_publication_url()[source]
Scrape the quarterly-reports page for the latest .xlsx download link.
policeombudsman.org returns 403 to default User-Agents; this function uses a browser-like UA via
bolster.utils.web.session.- Returns:
Absolute URL of the latest quarterly Excel spreadsheet.
- Raises:
PSNIDataNotFoundError – If the page cannot be retrieved or no .xlsx link is found.
- Return type:
Example
>>> url = get_quarterly_publication_url() >>> url.startswith("https://") True
- bolster.data_sources.psni.parse_annual(file_path)[source]
Parse the annual Police Ombudsman statistics Excel workbook.
Extracts four key tables from the workbook:
totals: total complaints 2000/01 onwards (T1)by_district: complaints by policing district, 2011/12 onwards (T8)by_allegation_type: allegations by type & subtype, 2011/12+ (T10)by_outcome: complaint closures by outcome, 2011/12 onwards (T12)
- Parameters:
file_path (str) – Local path (or file-like) to the downloaded
.xlsxfile.- Returns:
Dict mapping breakdown name to tidy DataFrame. All DataFrames include
year(int, financial-year start) andyear_label(e.g."2024/25") columns.- Raises:
PSNIDataNotFoundError – If required sheets cannot be found.
- Return type:
Example
>>> from bolster.data_sources.psni import police_ombudsman >>> result = parse_annual.__doc__ # placeholder >>> 'totals' in result False
- bolster.data_sources.psni.parse_quarterly(file_path)[source]
Parse a quarterly Police Ombudsman statistics Excel workbook.
Extracts three tables:
complaints: complaints received by quarter × yearallegations: allegations received by quarter × yearby_district: complaints by policing district × year
The quarterly workbook covers the latest five financial years, with four quarters per year plus totals.
- Parameters:
file_path (str) – Local path (or file-like) to the downloaded
.xlsxfile.- Returns:
Dict mapping key name to long-form DataFrame. Each DataFrame includes
year_label(e.g."2024/25") andyear(int start year).- Raises:
PSNIDataNotFoundError – If required sheets cannot be found.
- Return type:
Example
>>> from bolster.data_sources.psni import police_ombudsman >>> True # real call requires downloaded file True
- bolster.data_sources.psni.validate_complaints(df, breakdown)[source]
Validate a Police Ombudsman complaints DataFrame.
Checks that:
The DataFrame is non-empty.
Required columns for the given breakdown are present.
The
yearcolumn contains plausible financial-year start years.Complaint / allegation counts are non-negative.
- Parameters:
df (pandas.DataFrame) – DataFrame to validate (as returned by
get_latest_complaints()).breakdown (str) – One of
"totals","by_district","by_allegation_type","by_outcome","quarterly".
- Returns:
Trueif validation passes.- Raises:
PSNIValidationError – If any check fails.
- Return type:
Example
>>> import pandas as pd >>> df = pd.DataFrame({"year": [2020, 2021], "complaints": [3000, 3100]}) >>> validate_complaints(df, "totals") True
- bolster.data_sources.psni.get_rtc_annual_summary(years=None, force_refresh=False)
Get annual summary statistics across multiple years.
Provides aggregated collision and casualty counts by year, useful for trend analysis.
- Parameters:
- Returns:
year: int
collisions: int (total collisions)
casualties: int (total casualties)
fatal: int (fatal casualties)
serious: int (serious injuries)
slight: int (slight injuries)
fatalities_per_100_collisions: float
- Return type:
DataFrame with columns
Example
>>> summary = get_annual_summary() >>> 'fatal' in summary.columns True
- bolster.data_sources.psni.get_rtc_available_years()
Get list of years with available RTC data.
Example
>>> years = get_available_years() >>> len(years) > 0 True
- bolster.data_sources.psni.get_casualties(year=None, force_refresh=False, decode_values=True)[source]
Get casualty records for a specific year.
Each row represents a single casualty involved in a road traffic collision. Casualties are linked to collisions via the ‘ref’ column.
- Parameters:
- Returns:
year: int
ref: int (collision reference number for linking)
vehicle_id: int
casualty_id: int
casualty_class: str (road user type if decoded)
sex_code: int
age_group: int
severity: str (‘Fatal’, ‘Serious’, ‘Slight’ if decoded)
severity_code: int (1=fatal, 2=serious, 3=slight)
- Return type:
DataFrame with columns including
Example
>>> df = get_casualties(2024) >>> 'severity' in df.columns True
- bolster.data_sources.psni.get_casualties_by_district(year=None, force_refresh=False)[source]
Get casualty counts by policing district.
- Parameters:
- Returns:
district: str (policing district name)
lgd_code: str (ONS LGD code)
collisions: int
casualties: int
fatal: int
serious: int
slight: int
- Return type:
DataFrame with columns
Example
>>> by_district = get_casualties_by_district(2024) >>> 'district' in by_district.columns True
- bolster.data_sources.psni.get_casualties_by_road_user(year=None, force_refresh=False)[source]
Get casualty counts by road user type.
- Parameters:
- Returns:
casualty_class: str (road user type)
casualties: int
fatal: int
serious: int
slight: int
fatality_rate: float (fatal / total %)
- Return type:
DataFrame with columns
Example
>>> by_user = get_casualties_by_road_user(2024) >>> 'casualty_class' in by_user.columns True
- bolster.data_sources.psni.get_casualties_with_collision_details(year=None, force_refresh=False)[source]
Get casualty records merged with collision details.
Combines casualty data with collision information including date, location, and road conditions.
- Parameters:
- Returns:
DataFrame with casualty records enriched with collision details
- Return type:
Example
>>> df = get_casualties_with_collision_details(2024) >>> 'severity' in df.columns True
- bolster.data_sources.psni.get_collisions(year=None, force_refresh=False, decode_values=True)[source]
Get collision records for a specific year.
Each row represents a single road traffic collision with details about date, time, location, road conditions, and severity.
- Parameters:
- Returns:
year: int
ref: int (collision reference number)
district: str (policing district name if decoded)
district_code: str (original code)
month: int
day: int
weekday: str (day name if decoded)
hour: int
vehicles: int (number of vehicles)
casualties: int (number of casualties)
light_conditions: str (if decoded)
weather: str (if decoded)
road_surface: str (if decoded)
lgd_code: str (ONS LGD code)
nuts3_code: str (NUTS3 region code)
- Return type:
DataFrame with columns including
Example
>>> df = get_collisions(2024) >>> 'severity' in df.columns or 'district' in df.columns True
- bolster.data_sources.psni.get_vehicles(year=None, force_refresh=False, decode_values=True)[source]
Get vehicle records for a specific year.
Each row represents a single vehicle involved in a road traffic collision. Vehicles are linked to collisions via the ‘ref’ column.
- Parameters:
- Returns:
year: int
ref: int (collision reference number for linking)
vehicle_id: int
vehicle_type: str (if decoded)
vehicle_type_code: int
driver_sex_code: int
driver_age_group: int
- Return type:
DataFrame with columns including
Example
>>> df = get_vehicles(2024) >>> 'vehicle_id' in df.columns True
- bolster.data_sources.psni.validate_rtc_data(df, data_type)
Validate RTC data integrity.
- Parameters:
df (pandas.DataFrame) – DataFrame to validate
data_type (Literal['collision', 'casualty', 'vehicle']) – Type of data (‘collision’, ‘casualty’, or ‘vehicle’)
- Returns:
True if validation passes
- Raises:
PSNIValidationError – If validation fails
- Return type: