t_res.utils.REL.entity_disambiguation module

class t_res.utils.REL.entity_disambiguation.EntityDisambiguation(db_embs, user_config, reset_embeddings=False)

EntityDisambiguation is a class that performs entity disambiguation, which is the task of resolving entity mentions in text to their corresponding entities in a knowledge base. It trains a model if it does not exist and uses the trained model to predict the most likely entity for each mention.

This class uses a deep learning architecture, specifically the MulRelRanker model, for entity disambiguation.

Note

Credit:

This class and its methods are adapted from the REL: Radboud Entity Linker Github repository: Copyright (c) 2020 Johannes Michael van Hulst. See the permission notice.

Reference:

@inproceedings{vanHulst:2020:REL,
author =    {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.},
title =     {REL: An Entity Linker Standing on the Shoulders of Giants},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series =    {SIGIR '20},
year =      {2020},
publisher = {ACM}
}
get_data_items(dataset, dname, predict=False)

Responsible for formatting the dataset. Triggers the preranking function.

Returns:

Preranking function.

normalize_scores(scores)

Normalizes a list of scores between 0 and 1 by rescaling them and computing their ratio over their sum.

Returns:

A list of normalized scores where each score is the ratio of the rescaled score over their sum.

Return type:

List[float]

predict(data)

Performs entity disambiguation on the given data. It does not require ground truth entities to be present.

Returns: Predictions and time taken for the ED step.

prerank(dataset, dname, predict=False)

Responsible for preranking the set of possible candidates using both context and p(e|m) scores.

Returns: Dataset with, by default, max 3 + 4 candidates per mention.

train(org_train_dataset, org_dev_dataset)

Trains the entity disambiguation model.

Returns:

None.

train_LR(train_json, dev_json, model_path_lr)

Function that applies LR to get confidence scores for the disambiguated entities. Recall should be high, because if it is low than we would have ignored a corrrect entity.

Returns:

None

entity_disambiguation.RANDOM_SEED = 42