bovine.clients.web
This module contains methods to parse a webpage and turn it into an ActivityPub object.
- exception bovine.clients.web.RobotFileDeniesAccess[source]
Used to indicate that robots.txt does not allow the user agent to access the url being queried
- class bovine.clients.web.WebPage(url: str, text: str | None = None, linked_ld: list = <factory>)[source]
Class to capture loading webpages and transforming their content in objects more usable in the Fediverse.
- async fetch(session: ClientSession | None = None, fetch_linked_ld=False)[source]
Fetches the webpage and transform its content using BeautifulSoup
- property jsonld: dict | list
Usage for json-ld contained in a page
page = WebPage( "https://www.allrecipes.com/recipe/263822/pasta-alla-norma-eggplant-pasta/" ) await page.fetch() print(page.jsonld[0][0])
For json-ld contained in the link header
page = WebPage('https://www.wikidata.org/wiki/Q76') await page.fetch(fetch_linked_ld=True) print(page.jsonld[0][0])
- property open_graph_page: dict
Creates an ActivityPub Page object from the Open Graph data