`t_res.utils.rel_e2e` module

t_res.utils.rel_e2e.rel_end_to_end(sent: str) → dict

Perform REL end-to-end entity linking using the API.

t_res.utils.rel_e2e.get_rel_from_api(dSentences: dict, rel_end2end_path: str) → None

Use the REL API to perform end-to-end entity linking.

Parameters:

dSentences (dict) – A dictionary of sentences, where the key is the article-sent identifier and the value is the full text of the sentence.
rel_end2end_path (str) – The path of the file where the REL results will be stored.

Returns:

None.

t_res.utils.rel_e2e.match_wikipedia_to_wikidata(wiki_title: str, path_to_db: str) → str

Retrieve the Wikidata ID corresponding to a Wikipedia title.

Parameters:

wiki_title (str) – A Wikipedia title in underscore-separated format.
path_to_db (str) – The path to your wikipedia database (e.g. “../resources/wikipedia/index_enwiki-latest.db”).

Returns:

The corresponding Wikidata QID for the entity, or "NIL" if not found.

Return type:

str

t_res.utils.rel_e2e.match_ent(pred_ents, start, end, prev_ann, gazetteer_ids)

Find the corresponding string and prediction information returned by REL for a specific gold standard token position in a sentence.

Parameters:

pred_ents (list) – A list of lists, where each inner list corresponds to a token.
start (int) – The start character offset of the token in the gold standard.
end (int) – The end character offset of the token in the gold standard.
prev_ann (str) – The entity type of the previous token.
gazetteer_ids (set) – A set of entity IDs in the knowledge base.

Returns:

A tuple with three elements:

Return type:

tuple

t_res.utils.rel_e2e.postprocess_rel(rel_preds, dSentences, gold_tokenization, wikigaz_ids)

Retokenize the REL output for each sentence to match the gold standard tokenization.

Parameters:

rel_preds (dict) – A dictionary containing the predictions using REL.
dSentences (dict) – A dictionary that maps a sentence ID to the text.
gold_tokenization (dict) – A dictionary that contains the tokenized sentence with gold standard annotations of entity type and link per sentence.
wikigaz_ids (set) – A set of Wikidata IDs of entities in the gazetteer.

Returns:

A dictionary that maps a sentence ID to the REL predictions, retokenized as in the gold standard.

Return type:

dict

t_res.utils.rel_e2e.store_rel(experiment: Experiment, dREL: dict, approach: str, how_split: str) → None

Store the REL results for a specific experiment, approach, and split, in the format required by the HIPE scorer.

Parameters:

experiment (Experiment) – The experiment object containing the results path and dataset.
dREL (dict) – A dictionary mapping sentence IDs to REL predictions.
approach (str) – The approach used for REL.
how_split (str) – The type of split for which to store the results (e.g., originalsplit, Ashton1860).

Returns:

None.

Note

This function saves a TSV file with the results in the Conll format required by the scorer.

t_res.utils.rel_e2e.run_rel_experiments(self) → None

Run the end-to-end REL experiments.

t_res.utils.rel_e2e module