bolster.utils.rss
RSS Feed parsing utilities for bolster.
This module provides utilities for parsing and working with RSS/Atom feeds, with a focus on government statistics and research publications.
Attributes
Classes
Represents a single entry from an RSS/Atom feed. |
|
Represents a parsed RSS/Atom feed. |
Functions
|
Parse a date string into a datetime object. |
|
Parse a feedparser entry into a FeedEntry object. |
|
Parse an RSS or Atom feed from a URL. |
|
Filter feed entries based on various criteria. |
|
Get the NISRA statistics feed from GOV.UK. |
Module Contents
- class bolster.utils.rss.FeedEntry[source]
Represents a single entry from an RSS/Atom feed.
- published: datetime.datetime | None = None[source]
- updated: datetime.datetime | None = None[source]
- class bolster.utils.rss.Feed[source]
Represents a parsed RSS/Atom feed.
- updated: datetime.datetime | None = None[source]
- bolster.utils.rss.parse_date(date_str)[source]
Parse a date string into a datetime object.
- Parameters:
date_str (str | None) – Date string in various formats
- Returns:
Parsed datetime object or None if parsing fails
- Return type:
datetime.datetime | None
- bolster.utils.rss.parse_feed_entry(entry)[source]
Parse a feedparser entry into a FeedEntry object.
- Parameters:
entry (feedparser.FeedParserDict) – feedparser entry dictionary
- Returns:
FeedEntry object
- Return type:
- bolster.utils.rss.parse_rss_feed(feed_url, timeout=30)[source]
Parse an RSS or Atom feed from a URL.
- Parameters:
- Returns:
Feed object containing parsed feed data
- Raises:
Exception – If the feed cannot be fetched
ValueError – If the feed cannot be parsed
- Return type:
Example
>>> feed = parse_rss_feed( ... "https://www.gov.uk/search/research-and-statistics.atom?" ... "content_store_document_type=all_research_and_statistics&" ... "organisations%5B%5D=northern-ireland-statistics-and-research-agency" ... ) >>> feed.title 'Research and statistics from Northern Ireland Statistics and Research Agency (NISRA)' >>> sorted(feed.__dataclass_fields__) ['description', 'entries', 'language', 'link', 'title', 'updated'] >>> len(feed.entries) > 0 True >>> entry = feed.entries[0] >>> sorted(entry.__dataclass_fields__) ['author', 'categories', 'content', 'id', 'link', 'published', 'summary', 'title', 'updated'] >>> isinstance(entry.title, str) and isinstance(entry.link, str) True >>> entry.link.startswith("http") True >>> from datetime import datetime >>> isinstance(entry.published, datetime) True
- bolster.utils.rss.filter_entries(entries, title_contains=None, category=None, after_date=None, before_date=None)[source]
Filter feed entries based on various criteria.
- Parameters:
entries (list[FeedEntry]) – List of FeedEntry objects to filter
title_contains (str | None) – Filter entries whose title contains this string (case-insensitive)
category (str | None) – Filter entries that have this category
after_date (datetime.datetime | str | None) – Filter entries published after this date
before_date (datetime.datetime | str | None) – Filter entries published before this date
- Returns:
Filtered list of FeedEntry objects
- Return type:
Example
>>> from bolster.utils.rss import FeedEntry, filter_entries >>> from datetime import datetime >>> entries = [ ... FeedEntry("Births Statistics April 2024", "http://example.com/1", published=datetime(2024, 4, 1)), ... FeedEntry("Deaths Statistics April 2024", "http://example.com/2", published=datetime(2024, 4, 2)), ... FeedEntry("Old Statistics 2023", "http://example.com/3", published=datetime(2023, 6, 1)), ... ] >>> recent = filter_entries(entries, title_contains="births", after_date="2024-01-01") >>> [e.title for e in recent] ['Births Statistics April 2024']
- bolster.utils.rss.get_nisra_statistics_feed(order='recent', timeout=30, limit=None)[source]
Get the NISRA statistics feed from GOV.UK.
The GOV.UK Atom feed returns 20 entries per page. When limit exceeds 20, multiple pages are fetched automatically.
- Parameters:
- Returns:
Feed object with NISRA statistics
- Return type:
Example
>>> feed = get_nisra_statistics_feed() >>> feed100 = get_nisra_statistics_feed(limit=100)