t_res.geoparser.ranking. Ranker
- class t_res.geoparser.ranking.Ranker(resources_path: str, mentions_to_wikidata: Optional[dict] = {}, wikidata_to_mentions: Optional[dict] = {})
The Ranker class implements a system for candidate selection through string variation ranking. Its subclasses provide methods to select candidates based on different matching approaches, such as perfect match, partial match, Levenshtein distance, and DeezyMatch. The base class handles loading and processing of resources related to candidate selection.
- Parameters:
resources_path (str) – Relative path to the resources directory (containing Wikidata resources).
mentions_to_wikidata (dict, optional) – An empty dictionary which will store the mapping between mentions and Wikidata IDs, which will be loaded through the
load()
method.wikidata_to_mentions (dict, optional) – An empty dictionary which will store the mapping between Wikidata IDs and mentions, which will be loaded through the
load()
method.
This base class should not be instatiated directly. Instead use a subclass constructor.
Example
>>> # Create a Ranker object: >>> ranker = PerfectMatchRanker(resources_path="/path/to/resources/") >>> # Load resources >>> ranker.load() >>> # Perform candidate selection >>> queries = ['London', 'Paraguay'] >>> results = [ranker.run(query) for query in queries] >>> # Print the results >>> print("Candidate Selection Results:") >>> for candidates in results: >>> print(candidates)
- load()
Load the ranker resources.
Note
This method loads the mentions-to-wikidata and wikidata-to-mentions dictionaries from the resources directory, specified when initialising the
Ranker()
. They are required for performing candidate selection and ranking.The loaded mentions-to-wikidata dictionary maps a toponym (e.g.
"London"
) to the Wikidata entities that are referred to by this toponym on Wikipedia (e.g.Q84
,Q2477346
). The data also includes, for each entity, its normalized “relevance”, i.e. number of in-links across Wikipedia.The loaded dictionaries are filtered to remove noise and the class attributes are updated accordingly.
- matches(query: str) List[StringMatch]
Identify string matching candidates for the given toponym query.
Each Ranker subclass must implement a ranking method by overriding this function.
- Parameters:
query (str) – A toponym to be matched.
- Raises:
NotImplementedError – If this method is not overridden in a subclass.
- Returns:
- A list of StringMatch instances, containing
potential matches for the given toponym.
- Return type:
List[StringMatch]
- run(mention: Mention) CandidateMatches
Execute the ranking process for a given toponym query.
- Parameters:
query (str) – A toponym to be matched.
- Returns:
- An instance of the CandidateMatches dataclass,
containing potential string matches for the given toponym, each with a list of potential Wikidata ID links.
- Return type:
CandidateMatches
Note: the string matches are added to the cache for efficient retrieval.