t_res.utils.process_wikipedia
module
- t_res.utils.process_wikipedia.make_wikilinks_consistent(url: str) str
Make the wiki links consistent by performing the following operations:
Convert the URL to lowercase.
Unquote the URL to decode any percent-encoded characters.
Replace underscores with spaces if they exist in the unquoted URL.
Remove any fragment identifier (text after the ‘#’ symbol) if present.
Quote the modified URL to encode any special characters.
- Parameters:
url (str) – The URL to make consistent.
- Returns:
The modified and quoted URL.
- Return type:
Example
>>> make_wikilinks_consistent("Python_(programming_language)#Overview") 'python%20%28programming%20language%29' >>> make_wikilinks_consistent("Data_science") 'data%20science' >>> make_wikilinks_consistent("San_Francisco") 'san%20francisco'
- t_res.utils.process_wikipedia.make_wikipedia2wikidata_consisent(entity: str) str
Make the Wikipedia entity consistent with Wikidata by performing the following operations:
Make the wiki links consistent using the ‘make_wikilinks_consistent’ function.
Unquote the modified and quoted URL to decode any percent-encoded characters.
Replace spaces with underscores in the unquoted URL.
- Parameters:
entity (str) – The Wikipedia entity to make consistent.
- Returns:
The modified Wikipedia entity consistent with the wikipedia2wikidata mapper.
- Return type:
Example
>>> make_wikipedia2wikidata_consistent("New York City") 'new_york_city' >>> make_wikipedia2wikidata_consistent("Data science") 'data_science'
- t_res.utils.process_wikipedia.title_to_id(page_title: str, path_to_db: str, lower: Optional[bool] = False) Optional[str]
Given a Wikipedia page title, returns the corresponding Wikidata ID. The page title is the last part of a Wikipedia url unescaped and spaces replaced by underscores , e.g. for https://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem, the title would be Fermat’s_Last_Theorem.
- Parameters:
path_to_db – The path to the wikidata2wikipedia db
page_title – The page title of the Wikipedia entry, e.g.
Manatee
.
- Returns:
If a mapping could be found for
wiki_page_title
, then returns the mapping, otherwise None.- Return type:
str, optional
- Credit:
This function is adapted from https://github.com/jcklie/wikimapper.