t_res.utils.process_wikipedia module

Make the wiki links consistent by performing the following operations:

  1. Convert the URL to lowercase.

  2. Unquote the URL to decode any percent-encoded characters.

  3. Replace underscores with spaces if they exist in the unquoted URL.

  4. Remove any fragment identifier (text after the ‘#’ symbol) if present.

  5. Quote the modified URL to encode any special characters.

Parameters:

url (str) – The URL to make consistent.

Returns:

The modified and quoted URL.

Return type:

str

Example

>>> make_wikilinks_consistent("Python_(programming_language)#Overview")
'python%20%28programming%20language%29'
>>> make_wikilinks_consistent("Data_science")
'data%20science'
>>> make_wikilinks_consistent("San_Francisco")
'san%20francisco'
t_res.utils.process_wikipedia.make_wikipedia2wikidata_consisent(entity: str) str

Make the Wikipedia entity consistent with Wikidata by performing the following operations:

  1. Make the wiki links consistent using the ‘make_wikilinks_consistent’ function.

  2. Unquote the modified and quoted URL to decode any percent-encoded characters.

  3. Replace spaces with underscores in the unquoted URL.

Parameters:

entity (str) – The Wikipedia entity to make consistent.

Returns:

The modified Wikipedia entity consistent with the wikipedia2wikidata mapper.

Return type:

str

Example

>>> make_wikipedia2wikidata_consistent("New York City")
'new_york_city'
>>> make_wikipedia2wikidata_consistent("Data science")
'data_science'
t_res.utils.process_wikipedia.title_to_id(page_title: str, path_to_db: str, lower: Optional[bool] = False) Optional[str]

Given a Wikipedia page title, returns the corresponding Wikidata ID. The page title is the last part of a Wikipedia url unescaped and spaces replaced by underscores , e.g. for https://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem, the title would be Fermat’s_Last_Theorem.

Parameters:
  • path_to_db – The path to the wikidata2wikipedia db

  • page_title – The page title of the Wikipedia entry, e.g. Manatee.

Returns:

If a mapping could be found for wiki_page_title, then returns the mapping, otherwise None.

Return type:

str, optional

Credit:

This function is adapted from https://github.com/jcklie/wikimapper.