bolster.data_sources.psni

PSNI (Police Service of Northern Ireland) Data Sources.

This module provides access to PSNI open data including: - Crime Statistics: Police recorded crime data with monthly updates - Road Traffic Collisions: Injury collision, casualty, and vehicle data - Police Ombudsman: Complaint statistics from 2000/01 to present - PACE Statistics: Annual stop & search and arrests under the Police and

Criminal Evidence (PACE) Order

Data is sourced from OpenDataNI and the Police Ombudsman’s Office under the Open Government Licence v3.0. Geographic breakdowns use the 11 Policing Districts which align with Northern Ireland’s Local Government Districts (LGDs), enabling integration with other NISRA datasets.

Example

>>> from bolster.data_sources.psni import crime_statistics, road_traffic_collisions
>>> df = crime_statistics.get_historical_crime_statistics()
>>> 'lgd_code' in df.columns
True
>>> lgd_code = crime_statistics.get_lgd_code("Belfast City")
>>> lgd_code
'N09000003'
>>> casualties = road_traffic_collisions.get_casualties()
>>> 'severity' in casualties.columns
True

See individual module docstrings for detailed documentation.

Submodules

Exceptions

PSNIDataError

Base exception for PSNI data errors.

PSNIDataNotFoundError

Raised when a PSNI data file cannot be downloaded or accessed.

PSNIDataStaleError

Raised when a PSNI data source is known to be stale with no accessible update.

PSNIValidationError

Raised when PSNI data fails validation checks.

Functions

clear_cache([pattern])

Clear cached files from the PSNI cache directory.

filter_by_crime_type(df, crime_type)

Filter crime statistics to specific crime type(s).

filter_by_date_range(df[, start_date, end_date])

Filter crime statistics to a date range.

filter_by_district(df, district)

Filter crime statistics to specific policing district(s).

get_available_crime_types(df)

Get list of all crime types in the dataset.

get_available_districts(df)

Get list of all policing districts in the dataset.

get_crime_trends(df[, crime_type, district, measure])

Get monthly crime trends for a specific crime type and district.

get_data_source_info()

Get information about crime statistics data sources.

get_historical_crime_statistics([force_refresh, ...])

Get historical police recorded crime statistics (April 2001 – December 2021).

get_latest_crime_statistics([force_refresh, ...])

Raises PSNIDataStaleError — use get_historical_crime_statistics() instead.

get_lgd_code(district_name)

Get LGD code for a policing district.

get_nuts3_code(district_name)

Get NUTS3 regional code for a policing district.

get_nuts_region_name(nuts3_code)

Get descriptive name for a NUTS3 region code.

get_outcome_rates_by_district(df[, year, crime_type])

Calculate crime outcome rates by policing district.

get_total_crimes_by_district(df[, year])

Calculate total recorded crimes by policing district.

parse_crime_statistics_file(file_path[, ...])

Parse PSNI crime statistics CSV file.

validate_crime_statistics(df)

Validate crime statistics data integrity.

get_annual_publication_url()

Scrape the complaint-statistics page for the latest .xlsx download link.

get_latest_complaints([breakdown, force_refresh])

Download and return the latest Police Ombudsman complaint data.

get_quarterly_publication_url()

Scrape the quarterly-reports page for the latest .xlsx download link.

parse_annual(file_path)

Parse the annual Police Ombudsman statistics Excel workbook.

parse_quarterly(file_path)

Parse a quarterly Police Ombudsman statistics Excel workbook.

validate_complaints(df, breakdown)

Validate a Police Ombudsman complaints DataFrame.

get_rtc_annual_summary([years, force_refresh])

Get annual summary statistics across multiple years.

get_rtc_available_years()

Get list of years with available RTC data.

get_casualties([year, force_refresh, decode_values])

Get casualty records for a specific year.

get_casualties_by_district([year, force_refresh])

Get casualty counts by policing district.

get_casualties_by_road_user([year, force_refresh])

Get casualty counts by road user type.

get_casualties_with_collision_details([year, ...])

Get casualty records merged with collision details.

get_collisions([year, force_refresh, decode_values])

Get collision records for a specific year.

get_vehicles([year, force_refresh, decode_values])

Get vehicle records for a specific year.

validate_rtc_data(df, data_type)

Validate RTC data integrity.

Package Contents

exception bolster.data_sources.psni.PSNIDataError[source]

Bases: Exception

Base exception for PSNI data errors.

All PSNI-specific exceptions inherit from this class, allowing callers to catch all PSNI errors with a single except clause.

Initialize self. See help(type(self)) for accurate signature.

exception bolster.data_sources.psni.PSNIDataNotFoundError[source]

Bases: PSNIDataError

Raised when a PSNI data file cannot be downloaded or accessed.

This exception is raised when: - Network requests fail (timeout, connection errors) - HTTP errors occur (404, 500, etc.) - The requested resource is unavailable

Initialize self. See help(type(self)) for accurate signature.

exception bolster.data_sources.psni.PSNIDataStaleError[source]

Bases: PSNIDataError

Raised when a PSNI data source is known to be stale with no accessible update.

This exception is raised when the underlying data source has not been updated and no machine-readable replacement is accessible (e.g. due to Cloudflare protection on the official PSNI website blocking automated downloads).

Initialize self. See help(type(self)) for accurate signature.

exception bolster.data_sources.psni.PSNIValidationError[source]

Bases: PSNIDataError

Raised when PSNI data fails validation checks.

This exception is raised when: - CSV structure doesn’t match expected columns - Data contains invalid or unexpected values - Required fields are missing or malformed

Initialize self. See help(type(self)) for accurate signature.

bolster.data_sources.psni.clear_cache(pattern=None)[source]

Clear cached files from the PSNI cache directory.

Parameters:

pattern (str | None) – Optional glob pattern to match specific files (e.g., *.csv). If None, clears all cached files in the directory.

Returns:

Number of files deleted

Return type:

int

Example

>>> from bolster.data_sources.psni._base import clear_cache
>>> deleted = clear_cache("*.csv")
>>> isinstance(deleted, int)
True
bolster.data_sources.psni.filter_by_crime_type(df, crime_type)[source]

Filter crime statistics to specific crime type(s).

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • crime_type (str | list[str]) – Crime type(s) to filter (e.g., “Burglary” or [“Violence with injury”, “Robbery”])

Returns:

Filtered DataFrame

Return type:

pandas.DataFrame

Example

>>> df = get_latest_crime_statistics()
>>> violence = filter_by_crime_type(df, "Violence with injury (including homicide & death/serious injury by unlawful driving)")
>>> len(violence) > 0
True
bolster.data_sources.psni.filter_by_date_range(df, start_date=None, end_date=None)[source]

Filter crime statistics to a date range.

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • start_date (str | datetime.datetime | None) – Start date (inclusive), e.g., “2020-01-01” or datetime

  • end_date (str | datetime.datetime | None) – End date (inclusive), e.g., “2021-12-31” or datetime

Returns:

Filtered DataFrame

Return type:

pandas.DataFrame

Example

>>> df = get_latest_crime_statistics()
>>> # Get 2020 data
>>> df_2020 = filter_by_date_range(df, "2020-01-01", "2020-12-31")
>>> df_2020['calendar_year'].unique().tolist()
[2020]
>>>
>>> # Get data from 2018 onwards
>>> recent = filter_by_date_range(df, start_date="2018-01-01")
>>> len(recent) > 0
True
bolster.data_sources.psni.filter_by_district(df, district)[source]

Filter crime statistics to specific policing district(s).

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • district (str | list[str]) – District name(s) to filter (e.g., “Belfast City” or [“Belfast City”, “Derry City & Strabane”])

Returns:

Filtered DataFrame

Return type:

pandas.DataFrame

Example

>>> df = get_latest_crime_statistics()
>>> belfast = filter_by_district(df, "Belfast City")
>>> belfast['policing_district'].unique().tolist()
['Belfast City']
>>>
>>> # Multiple districts
>>> cities = filter_by_district(df, ["Belfast City", "Derry City & Strabane"])
>>> len(cities['policing_district'].unique()) == 2
True
bolster.data_sources.psni.get_available_crime_types(df)[source]

Get list of all crime types in the dataset.

Parameters:

df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

Returns:

Sorted list of crime type names

Return type:

list[str]

Example

>>> df = get_latest_crime_statistics()
>>> crime_types = get_available_crime_types(df)
>>> isinstance(crime_types, list)
True
>>> 'Total police recorded crime' in crime_types
True
bolster.data_sources.psni.get_available_districts(df)[source]

Get list of all policing districts in the dataset.

Parameters:

df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

Returns:

Sorted list of district names

Return type:

list[str]

Example

>>> df = get_latest_crime_statistics()
>>> districts = get_available_districts(df)
>>> isinstance(districts, list)
True
>>> 'Northern Ireland' in districts
True

Get monthly crime trends for a specific crime type and district.

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • crime_type (str) – Crime type to analyze (default: total crimes)

  • district (str) – Policing district (default: Northern Ireland total)

  • measure (str) – Data measure to use (default: Police Recorded Crime)

Returns:

date, calendar_year, month, count

Return type:

DataFrame with columns

Example

>>> df = get_latest_crime_statistics()
>>> trends = get_crime_trends(df, district="Belfast City")
>>> sorted(trends.columns.tolist())
['calendar_year', 'count', 'date', 'month']
>>> len(trends) > 0
True
bolster.data_sources.psni.get_data_source_info()[source]

Get information about crime statistics data sources.

Returns a dictionary with URLs and contact information for accessing PSNI crime statistics. Use this when you need data beyond December 2021.

Returns:

  • opendatani_url: OpenDataNI dataset URL (data through Dec 2021)

  • data_guide_url: PDF data guide URL

  • psni_official_url: PSNI official statistics page (current data)

  • contact_email: PSNI Statistics Branch email

  • data_limitation: Description of OpenDataNI data limitations

  • last_update: Last known update date for OpenDataNI

Return type:

Dictionary with keys

Example

>>> info = get_data_source_info()
>>> sorted(info.keys())
['contact_email', 'data_guide_url', 'data_limitation', 'last_update', 'opendatani_url', 'psni_official_url']
bolster.data_sources.psni.get_historical_crime_statistics(force_refresh=False, add_geographic_codes=True)[source]

Get historical police recorded crime statistics (April 2001 – December 2021).

Downloads the crime statistics CSV from OpenDataNI. This dataset covers April 2001 through December 2021 and has not been updated since January 2022. For 2022+ data, consult PSNI directly.

Parameters:
  • force_refresh (bool) – If True, bypass cache and download fresh data

  • add_geographic_codes (bool) – If True, add LGD and NUTS3 code columns

Returns:

date, calendar_year, month, policing_district, crime_type, data_measure, count, lgd_code, nuts3_code, nuts3_name

Return type:

DataFrame with columns

Raises:

Example

>>> df = get_historical_crime_statistics()
>>> sorted(df.columns.tolist())
['calendar_year', 'count', 'crime_type', 'data_measure', 'date', 'lgd_code', 'month', 'nuts3_code', 'nuts3_name', 'policing_district']
>>> df['date'].max().year
2021
bolster.data_sources.psni.get_latest_crime_statistics(force_refresh=False, add_geographic_codes=True)[source]

Raises PSNIDataStaleError — use get_historical_crime_statistics() instead.

The OpenDataNI source was last updated January 2022. PSNI’s official site publishes current data but is Cloudflare-protected and inaccessible to automated downloads. Use get_historical_crime_statistics() to access the data available (Apr 2001–Dec 2021).

Raises:

PSNIDataStaleError – Always — this data source has no accessible update.

bolster.data_sources.psni.get_lgd_code(district_name)[source]

Get LGD code for a policing district.

Parameters:

district_name (str) – Policing district name (e.g., “Belfast City”)

Returns:

LGD code (e.g., “N09000003”) or None if not found

Return type:

str | None

Example

>>> get_lgd_code("Belfast City")
'N09000003'
bolster.data_sources.psni.get_nuts3_code(district_name)[source]

Get NUTS3 regional code for a policing district.

Uses NUTS 2021 classification where each LGD maps 1:1 to a NUTS3 region.

Parameters:

district_name (str) – Policing district name (e.g., “Belfast City”)

Returns:

NUTS3 code (e.g., “UKN06”) or None if not found

Return type:

str | None

Example

>>> get_nuts3_code("Belfast City")
'UKN06'
>>> get_nuts3_code("Derry City & Strabane")
'UKN0A'
bolster.data_sources.psni.get_nuts_region_name(nuts3_code)[source]

Get descriptive name for a NUTS3 region code.

Parameters:

nuts3_code (str) – NUTS3 code (e.g., “UKN06”)

Returns:

Region name (e.g., “Belfast”) or None if not found

Return type:

str | None

Example

>>> get_nuts_region_name("UKN06")
'Belfast'
>>> get_nuts_region_name("UKN0A")
'Derry City and Strabane'
bolster.data_sources.psni.get_outcome_rates_by_district(df, year=None, crime_type='Total police recorded crime')[source]

Calculate crime outcome rates by policing district.

Outcome rate represents the percentage of crimes with an outcome (charge, caution, community resolution, etc.)

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • year (int | None) – Optional year to filter (uses all years if None)

  • crime_type (str) – Crime type to analyze (default: total crimes)

Returns:

policing_district, lgd_code, average_outcome_rate

Return type:

DataFrame with columns

Example

>>> df = get_latest_crime_statistics()
>>> outcomes = get_outcome_rates_by_district(df, year=2021)
>>> 'average_outcome_rate' in outcomes.columns
True
bolster.data_sources.psni.get_total_crimes_by_district(df, year=None)[source]

Calculate total recorded crimes by policing district.

Parameters:
  • df (pandas.DataFrame) – DataFrame from get_latest_crime_statistics

  • year (int | None) – Optional year to filter (uses all years if None)

Returns:

policing_district, lgd_code, nuts3_code, total_crimes

Return type:

DataFrame with columns

Example

>>> df = get_latest_crime_statistics()
>>> totals_2021 = get_total_crimes_by_district(df, year=2021)
>>> sorted(totals_2021.columns.tolist())
['lgd_code', 'nuts3_code', 'policing_district', 'total_crimes']
bolster.data_sources.psni.parse_crime_statistics_file(file_path, add_geographic_codes=True)[source]

Parse PSNI crime statistics CSV file.

The file is in long format with columns for year, month, district, crime type, data measure, and count. This function reads the CSV, cleans column names, adds date parsing, and optionally adds LGD and NUTS3 geographic codes for cross-dataset integration.

Parameters:
  • file_path (str | pathlib.Path) – Path to the crime statistics CSV file

  • add_geographic_codes (bool) – If True, add LGD and NUTS3 code columns

Returns:

  • calendar_year: int (year of crime)

  • month: str (month name: Apr, May, …, Dec)

  • policing_district: str (district name or “Northern Ireland”)

  • crime_type: str (Home Office crime classification)

  • data_measure: str (type of measure - crime count, outcome number, outcome rate)

  • count: float (value - can be count or percentage)

  • date: datetime (first day of month)

  • lgd_code: str (ONS LGD code, if add_geographic_codes=True)

  • nuts3_code: str (NUTS3 region code, if add_geographic_codes=True)

  • nuts3_name: str (NUTS3 region name, if add_geographic_codes=True)

Return type:

DataFrame with columns

Raises:

PSNIValidationError – If file structure is unexpected

Example

>>> path = download_file(CRIME_STATISTICS_URL, cache_ttl_hours=24*7)
>>> df = parse_crime_statistics_file(path)
>>> 'crime_type' in df.columns
True
>>> len(df) > 0
True
bolster.data_sources.psni.validate_crime_statistics(df)[source]

Validate crime statistics data integrity.

Performs sanity checks on the crime statistics data: - Non-negative crime counts - Reasonable date ranges - Expected policing districts present - No unexpected missing data

Parameters:

df (pandas.DataFrame) – DataFrame from parse_crime_statistics_file or get_latest_crime_statistics

Returns:

True if validation passes

Raises:

PSNIValidationError – If validation fails

Return type:

bool

Example

>>> df = get_latest_crime_statistics()
>>> validate_crime_statistics(df)
True
bolster.data_sources.psni.get_annual_publication_url()[source]

Scrape the complaint-statistics page for the latest .xlsx download link.

policeombudsman.org returns 403 to default User-Agents; this function uses a browser-like UA via bolster.utils.web.session.

Returns:

Absolute URL of the latest annual Excel spreadsheet.

Raises:

PSNIDataNotFoundError – If the page cannot be retrieved or no .xlsx link is found.

Return type:

str

Example

>>> url = get_annual_publication_url()
>>> url.startswith("https://")
True
bolster.data_sources.psni.get_latest_complaints(breakdown='totals', force_refresh=False)[source]

Download and return the latest Police Ombudsman complaint data.

For totals, by_district, by_allegation_type, and by_outcome the annual publication is used (richest historical coverage). For quarterly the latest quarterly bulletin is used.

Parameters:
  • breakdown (str) –

    One of:

    • "totals" — total complaints 2000/01 to present (default)

    • "by_district" — complaints by policing district, 2011/12+

    • "by_allegation_type" — allegations by type, 2011/12+

    • "by_outcome" — closures by outcome, 2011/12+

    • "quarterly" — quarterly complaints, latest 5 financial years

  • force_refresh (bool) – If True, bypass cache and re-download the source file.

Returns:

Tidy DataFrame for the requested breakdown.

Raises:
Return type:

pandas.DataFrame

Example

>>> df = get_latest_complaints()
>>> set(["year", "complaints"]).issubset(df.columns)
True
>>> df_d = get_latest_complaints("by_district")
>>> "district" in df_d.columns
True
bolster.data_sources.psni.get_quarterly_publication_url()[source]

Scrape the quarterly-reports page for the latest .xlsx download link.

policeombudsman.org returns 403 to default User-Agents; this function uses a browser-like UA via bolster.utils.web.session.

Returns:

Absolute URL of the latest quarterly Excel spreadsheet.

Raises:

PSNIDataNotFoundError – If the page cannot be retrieved or no .xlsx link is found.

Return type:

str

Example

>>> url = get_quarterly_publication_url()
>>> url.startswith("https://")
True
bolster.data_sources.psni.parse_annual(file_path)[source]

Parse the annual Police Ombudsman statistics Excel workbook.

Extracts four key tables from the workbook:

  • totals: total complaints 2000/01 onwards (T1)

  • by_district: complaints by policing district, 2011/12 onwards (T8)

  • by_allegation_type: allegations by type & subtype, 2011/12+ (T10)

  • by_outcome: complaint closures by outcome, 2011/12 onwards (T12)

Parameters:

file_path (str) – Local path (or file-like) to the downloaded .xlsx file.

Returns:

Dict mapping breakdown name to tidy DataFrame. All DataFrames include year (int, financial-year start) and year_label (e.g. "2024/25") columns.

Raises:

PSNIDataNotFoundError – If required sheets cannot be found.

Return type:

dict[str, pandas.DataFrame]

Example

>>> from bolster.data_sources.psni import police_ombudsman
>>> result = parse_annual.__doc__  # placeholder
>>> 'totals' in result
False
bolster.data_sources.psni.parse_quarterly(file_path)[source]

Parse a quarterly Police Ombudsman statistics Excel workbook.

Extracts three tables:

  • complaints: complaints received by quarter × year

  • allegations: allegations received by quarter × year

  • by_district: complaints by policing district × year

The quarterly workbook covers the latest five financial years, with four quarters per year plus totals.

Parameters:

file_path (str) – Local path (or file-like) to the downloaded .xlsx file.

Returns:

Dict mapping key name to long-form DataFrame. Each DataFrame includes year_label (e.g. "2024/25") and year (int start year).

Raises:

PSNIDataNotFoundError – If required sheets cannot be found.

Return type:

dict[str, pandas.DataFrame]

Example

>>> from bolster.data_sources.psni import police_ombudsman
>>> True  # real call requires downloaded file
True
bolster.data_sources.psni.validate_complaints(df, breakdown)[source]

Validate a Police Ombudsman complaints DataFrame.

Checks that:

  • The DataFrame is non-empty.

  • Required columns for the given breakdown are present.

  • The year column contains plausible financial-year start years.

  • Complaint / allegation counts are non-negative.

Parameters:
Returns:

True if validation passes.

Raises:

PSNIValidationError – If any check fails.

Return type:

bool

Example

>>> import pandas as pd
>>> df = pd.DataFrame({"year": [2020, 2021], "complaints": [3000, 3100]})
>>> validate_complaints(df, "totals")
True
bolster.data_sources.psni.get_rtc_annual_summary(years=None, force_refresh=False)

Get annual summary statistics across multiple years.

Provides aggregated collision and casualty counts by year, useful for trend analysis.

Parameters:
  • years (list[int] | None) – List of years to include (default: all available)

  • force_refresh (bool) – If True, bypass cache and re-download

Returns:

  • year: int

  • collisions: int (total collisions)

  • casualties: int (total casualties)

  • fatal: int (fatal casualties)

  • serious: int (serious injuries)

  • slight: int (slight injuries)

  • fatalities_per_100_collisions: float

Return type:

DataFrame with columns

Example

>>> summary = get_annual_summary()
>>> 'fatal' in summary.columns
True
bolster.data_sources.psni.get_rtc_available_years()

Get list of years with available RTC data.

Returns:

List of years (integers) in descending order

Return type:

list[int]

Example

>>> years = get_available_years()
>>> len(years) > 0
True
bolster.data_sources.psni.get_casualties(year=None, force_refresh=False, decode_values=True)[source]

Get casualty records for a specific year.

Each row represents a single casualty involved in a road traffic collision. Casualties are linked to collisions via the ‘ref’ column.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

  • decode_values (bool) – If True, decode coded values to human-readable strings

Returns:

  • year: int

  • ref: int (collision reference number for linking)

  • vehicle_id: int

  • casualty_id: int

  • casualty_class: str (road user type if decoded)

  • sex_code: int

  • age_group: int

  • severity: str (‘Fatal’, ‘Serious’, ‘Slight’ if decoded)

  • severity_code: int (1=fatal, 2=serious, 3=slight)

Return type:

DataFrame with columns including

Example

>>> df = get_casualties(2024)
>>> 'severity' in df.columns
True
bolster.data_sources.psni.get_casualties_by_district(year=None, force_refresh=False)[source]

Get casualty counts by policing district.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

Returns:

  • district: str (policing district name)

  • lgd_code: str (ONS LGD code)

  • collisions: int

  • casualties: int

  • fatal: int

  • serious: int

  • slight: int

Return type:

DataFrame with columns

Example

>>> by_district = get_casualties_by_district(2024)
>>> 'district' in by_district.columns
True
bolster.data_sources.psni.get_casualties_by_road_user(year=None, force_refresh=False)[source]

Get casualty counts by road user type.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

Returns:

  • casualty_class: str (road user type)

  • casualties: int

  • fatal: int

  • serious: int

  • slight: int

  • fatality_rate: float (fatal / total %)

Return type:

DataFrame with columns

Example

>>> by_user = get_casualties_by_road_user(2024)
>>> 'casualty_class' in by_user.columns
True
bolster.data_sources.psni.get_casualties_with_collision_details(year=None, force_refresh=False)[source]

Get casualty records merged with collision details.

Combines casualty data with collision information including date, location, and road conditions.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

Returns:

DataFrame with casualty records enriched with collision details

Return type:

pandas.DataFrame

Example

>>> df = get_casualties_with_collision_details(2024)
>>> 'severity' in df.columns
True
bolster.data_sources.psni.get_collisions(year=None, force_refresh=False, decode_values=True)[source]

Get collision records for a specific year.

Each row represents a single road traffic collision with details about date, time, location, road conditions, and severity.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

  • decode_values (bool) – If True, decode coded values to human-readable strings

Returns:

  • year: int

  • ref: int (collision reference number)

  • district: str (policing district name if decoded)

  • district_code: str (original code)

  • month: int

  • day: int

  • weekday: str (day name if decoded)

  • hour: int

  • vehicles: int (number of vehicles)

  • casualties: int (number of casualties)

  • light_conditions: str (if decoded)

  • weather: str (if decoded)

  • road_surface: str (if decoded)

  • lgd_code: str (ONS LGD code)

  • nuts3_code: str (NUTS3 region code)

Return type:

DataFrame with columns including

Example

>>> df = get_collisions(2024)
>>> 'severity' in df.columns or 'district' in df.columns
True
bolster.data_sources.psni.get_vehicles(year=None, force_refresh=False, decode_values=True)[source]

Get vehicle records for a specific year.

Each row represents a single vehicle involved in a road traffic collision. Vehicles are linked to collisions via the ‘ref’ column.

Parameters:
  • year (int | None) – Year to fetch (default: latest available)

  • force_refresh (bool) – If True, bypass cache and re-download

  • decode_values (bool) – If True, decode coded values to human-readable strings

Returns:

  • year: int

  • ref: int (collision reference number for linking)

  • vehicle_id: int

  • vehicle_type: str (if decoded)

  • vehicle_type_code: int

  • driver_sex_code: int

  • driver_age_group: int

Return type:

DataFrame with columns including

Example

>>> df = get_vehicles(2024)
>>> 'vehicle_id' in df.columns
True
bolster.data_sources.psni.validate_rtc_data(df, data_type)

Validate RTC data integrity.

Parameters:
  • df (pandas.DataFrame) – DataFrame to validate

  • data_type (Literal['collision', 'casualty', 'vehicle']) – Type of data (‘collision’, ‘casualty’, or ‘vehicle’)

Returns:

True if validation passes

Raises:

PSNIValidationError – If validation fails

Return type:

bool