t_res.utils.REL.entity_disambiguation
module
- class t_res.utils.REL.entity_disambiguation.EntityDisambiguation(db_embs, user_config, reset_embeddings=False)
EntityDisambiguation is a class that performs entity disambiguation, which is the task of resolving entity mentions in text to their corresponding entities in a knowledge base. It trains a model if it does not exist and uses the trained model to predict the most likely entity for each mention.
This class uses a deep learning architecture, specifically the
MulRelRanker
model, for entity disambiguation.Note
Credit:
This class and its methods are adapted from the REL: Radboud Entity Linker Github repository: Copyright (c) 2020 Johannes Michael van Hulst. See the permission notice.
Reference: @inproceedings{vanHulst:2020:REL, author = {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.}, title = {REL: An Entity Linker Standing on the Shoulders of Giants}, booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, series = {SIGIR '20}, year = {2020}, publisher = {ACM} }
- get_data_items(dataset, dname, predict=False)
Responsible for formatting the dataset. Triggers the preranking function.
- Returns:
Preranking function.
- normalize_scores(scores)
Normalizes a list of scores between 0 and 1 by rescaling them and computing their ratio over their sum.
- Returns:
A list of normalized scores where each score is the ratio of the rescaled score over their sum.
- Return type:
List[float]
- predict(data)
Performs entity disambiguation on the given data. It does not require ground truth entities to be present.
Returns: Predictions and time taken for the ED step.
- prerank(dataset, dname, predict=False)
Responsible for preranking the set of possible candidates using both context and p(e|m) scores.
Returns: Dataset with, by default, max 3 + 4 candidates per mention.
- train(org_train_dataset, org_dev_dataset)
Trains the entity disambiguation model.
- Returns:
None.
- train_LR(train_json, dev_json, model_path_lr)
Function that applies LR to get confidence scores for the disambiguated entities. Recall should be high, because if it is low than we would have ignored a corrrect entity.
- Returns:
None
- entity_disambiguation.RANDOM_SEED = 42