bovine.clients.web

This module contains methods to parse a webpage and turn it into an ActivityPub object.

exception bovine.clients.web.RobotFileDeniesAccess[source]

Used to indicate that robots.txt does not allow the user agent to access the url being queried

class bovine.clients.web.WebPage(url: str, text: str | None = None, linked_ld: list = <factory>)[source]

Class to capture loading webpages and transforming their content in objects more usable in the Fediverse.

async fetch(session: ClientSession | None = None, fetch_linked_ld=False)[source]

Fetches the webpage and transform its content using BeautifulSoup

property jsonld: dict | list

Usage for json-ld contained in a page

page = WebPage(
    "https://www.allrecipes.com/recipe/263822/pasta-alla-norma-eggplant-pasta/"
)
await page.fetch()
print(page.jsonld[0][0])

For json-ld contained in the link header

page = WebPage('https://www.wikidata.org/wiki/Q76')
await page.fetch(fetch_linked_ld=True)
print(page.jsonld[0][0])
property open_graph_page: dict

Creates an ActivityPub Page object from the Open Graph data