t_res.geoparser.linking.Linker

class t_res.geoparser.linking.Linker(method: Literal['mostpopular', 'reldisamb', 'bydistance'], resources_path: str, experiments_path: Optional[str] = '../experiments', linking_resources: Optional[dict] = {}, overwrite_training: Optional[bool] = False, rel_params: Optional[dict] = None)

The Linker class provides methods for entity linking, which is the task of associating mentions in text with their corresponding entities in a knowledge base.

Parameters:
  • method (Literal["mostpopular", "reldisamb", "bydistance"]) – The linking method to use.

  • resources_path (str) – The path to the linking resources.

  • experiments_path (str, optional) – The path to the experiments directory. Default is “../experiments/”.

  • linking_resources (dict, optional) – Dictionary containing the necessary linking resources. Defaults to dict() (an empty dictionary).

  • overwrite_training (bool) – Flag indicating whether to overwrite the training. Defaults to False.

  • rel_params (dict, optional) – Dictionary containing the parameters for performing entity disambiguation using the reldisamb approach (adapted from the Radboud Entityt Linker, REL). For the default settings, see Notes below.

Example:

linker = Linker(
  method="mostpopular",
  resources_path="/path/to/resources/",
  experiments_path="/path/to/experiments/",
  linking_resources={},
  overwrite_training=True,
  rel_params={"with_publication": True, "do_test": True}
)

Note

  • Note that, in order to instantiate the Linker with the reldisamb

method, the Linker needs to be wrapped by a context manager in which a connection to the entity embeddings database is established and a cursor is created:

with sqlite3.connect("../resources/rel_db/embeddings_database.db") as conn:
  cursor = conn.cursor()
  mylinker = linking.Linker(
  method="reldisamb",
  resources_path="../resources/",
  experiments_path="../experiments/",
  linking_resources=dict(),
  rel_params={
    "model_path": "../resources/models/disambiguation/",
    "data_path": "../experiments/outputs/data/lwm/",
    "training_split": "",
    "db_embeddings": cursor,
    "with_publication": wpubl,
    "without_microtoponyms": wmtops,
    "do_test": False,
    "default_publname": "",
    "default_publwqid": "",
  },
  overwrite_training=False,
)

See below the default settings for rel_params. Note that db_embeddings defaults to None, but it should be assigned a cursor to the entity embeddings database, as described above:

rel_params: Optional[dict] = {
  "model_path": "../resources/models/disambiguation/",
  "data_path": "../experiments/outputs/data/lwm/",
  "training_split": "originalsplit",
  "db_embeddings": None,
  "with_publication": True,
  "without_microtoponyms": True,
  "do_test": False,
  "default_publname": "United Kingdom",
  "default_publwqid": "Q145",
}
by_distance(dict_mention: dict, origin_wqid: Optional[str] = '') Tuple[str, float, dict]

Select candidate based on distance to the place of publication.

Parameters:
  • dict_mention (dict) – dictionary with all the relevant information needed to disambiguate a certain mention.

  • origin_wqid (str, optional) – The origin Wikidata ID for distance calculation. Defaults to "".

Returns:

A tuple containing the Wikidata ID of the closest candidate to the place of publication (e.g. "Q84") or "NIL", the confidence score of the predicted link as a float (rounded to 3 decimals), and a dictionary of all candidates and their confidence scores.

Return type:

Tuple[str, float, dict]

Note

Applying the “by distance” disambiguation method for linking entities, based on geographical distance. It undertakes an unsupervised disambiguation, which returns a prediction of a location closest to the place of publication, for a provided set of candidates and the place of publication of the original text.

load_resources() dict

Loads the linking resources.

Returns:

Dictionary containing loaded necessary linking resources.

Return type:

dict

Note

Different methods will require different resources.

Select most popular candidate, given Wikipedia’s in-link structure.

Parameters:

dict_mention (dict) – dictionary with all the relevant information needed to disambiguate a certain mention.

Returns:

A tuple containing the most popular candidate’s Wikidata ID (e.g. "Q84") or "NIL", the confidence score of the predicted link as a float, and a dictionary of all candidates and their confidence scores.

Return type:

Tuple[str, float, dict]

Note

Applying the “most popular” disambiguation method for linking entities. Given a set of candidates for a given mention, the function returns as a prediction the more relevant Wikidata candidate, determined from the in-link structure of Wikipedia.

run(dict_mention: dict) Tuple[str, float, dict]

Executes the linking process based on the specified unsupervised method.

Parameters:

dict_mention – Dictionary containing the mention information.

Returns:

The result of the linking process. For details, see below:

  • If the method provided when initialising the Linker() object was "mostpopular", see most_popular().

  • If the method provided when initialising the Linker() object was "bydistance", see by_distance().

Return type:

Tuple[str, float, dict]

train_load_model(myranker: Ranker, split: Optional[str] = 'originalsplit') EntityDisambiguation

Trains or loads the entity disambiguation model.

Parameters:
  • myranker (geoparser.ranking.Ranker) – The ranker object used for training.

  • split (str, optional) – The split type for training. Defaults to "originalsplit".

Returns:

A trained Entity Disambiguation model.

Return type:

entity_disambiguation.EntityDisambiguation

Note

The training will be skipped if the model already exists and overwrite_training was set to False when initiating the Linker object, or if the disambiguation method is unsupervised. The training will be run on test mode if rel_params had a do_test key’s value set to True when initiating the Linker object.

Note

Credit:

This method is adapted from the REL: Radboud Entity Linker Github repository: Copyright (c) 2020 Johannes Michael van Hulst. See the permission notice.

Reference:

@inproceedings{vanHulst:2020:REL,
author =    {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.},
title =     {REL: An Entity Linker Standing on the Shoulders of Giants},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series =    {SIGIR '20},
year =      {2020},
publisher = {ACM}
}
linking.RANDOM_SEED = 42