t_res.utils.rel_e2e module

t_res.utils.rel_e2e.rel_end_to_end(sent: str) dict

Perform REL end-to-end entity linking using the API.

Parameters:

sent (str) – A sentence in plain text.

Returns:

The output from the REL end-to-end API for the input sentence.

Return type:

dict

t_res.utils.rel_e2e.get_rel_from_api(dSentences: dict, rel_end2end_path: str) None

Use the REL API to perform end-to-end entity linking.

Parameters:
  • dSentences (dict) – A dictionary of sentences, where the key is the article-sent identifier and the value is the full text of the sentence.

  • rel_end2end_path (str) – The path of the file where the REL results will be stored.

Returns:

None.

t_res.utils.rel_e2e.match_wikipedia_to_wikidata(wiki_title: str, path_to_db: str) str

Retrieve the Wikidata ID corresponding to a Wikipedia title.

Parameters:
  • wiki_title (str) – A Wikipedia title in underscore-separated format.

  • path_to_db (str) – The path to your wikipedia database (e.g. “../resources/wikipedia/index_enwiki-latest.db”).

Returns:

The corresponding Wikidata QID for the entity, or "NIL" if not found.

Return type:

str

t_res.utils.rel_e2e.match_ent(pred_ents, start, end, prev_ann, gazetteer_ids)

Find the corresponding string and prediction information returned by REL for a specific gold standard token position in a sentence.

Parameters:
  • pred_ents (list) – A list of lists, where each inner list corresponds to a token.

  • start (int) – The start character offset of the token in the gold standard.

  • end (int) – The end character offset of the token in the gold standard.

  • prev_ann (str) – The entity type of the previous token.

  • gazetteer_ids (set) – A set of entity IDs in the knowledge base.

Returns:

A tuple with three elements:
  1. The entity type.

  2. The entity link.

  3. The entity type of the previous token.

Return type:

tuple

t_res.utils.rel_e2e.postprocess_rel(rel_preds, dSentences, gold_tokenization, wikigaz_ids)

Retokenize the REL output for each sentence to match the gold standard tokenization.

Parameters:
  • rel_preds (dict) – A dictionary containing the predictions using REL.

  • dSentences (dict) – A dictionary that maps a sentence ID to the text.

  • gold_tokenization (dict) – A dictionary that contains the tokenized sentence with gold standard annotations of entity type and link per sentence.

  • wikigaz_ids (set) – A set of Wikidata IDs of entities in the gazetteer.

Returns:

A dictionary that maps a sentence ID to the REL predictions, retokenized as in the gold standard.

Return type:

dict

t_res.utils.rel_e2e.store_rel(experiment: Experiment, dREL: dict, approach: str, how_split: str) None

Store the REL results for a specific experiment, approach, and split, in the format required by the HIPE scorer.

Parameters:
  • experiment (Experiment) – The experiment object containing the results path and dataset.

  • dREL (dict) – A dictionary mapping sentence IDs to REL predictions.

  • approach (str) – The approach used for REL.

  • how_split (str) – The type of split for which to store the results (e.g., originalsplit, Ashton1860).

Returns:

None.

Note

This function saves a TSV file with the results in the Conll format required by the scorer.

t_res.utils.rel_e2e.run_rel_experiments(self) None

Run the end-to-end REL experiments.

Returns:

None.