`t_res.utils.REL.vocabulary` module

class t_res.utils.REL.vocabulary.Vocabulary

A class representing a vocabulary object used for storing references to embeddings.

Note

Credit:

The code for this class and its methods is taken from the REL: Radboud Entity Linker Github repository: Copyright (c) 2020 Johannes Michael van Hulst. See the permission notice. See the original script for more information.

Reference:

@inproceedings{vanHulst:2020:REL,
author =    {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.},
title =     {REL: An Entity Linker Standing on the Shoulders of Giants},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series =    {SIGIR '20},
year =      {2020},
publisher = {ACM}
}

add_to_vocab(token: str) → None

Add the given token to the vocabulary.

Parameters:: token (str) – The token to be added to the vocabulary.
Returns:: None.

get_id(token: str) → int

Get the ID associated with the given token from the vocabulary.

Parameters:: token (str) – The token for which to retrieve the ID.
Returns:: The ID of the token in the vocabulary, or the ID of the unknown token if the token is not found.
Return type:: int

static normalize(token: str, lower: Optional[bool] = False, digit_0: Optional[bool] = False) → str

Normalise the given token based on the specified normalisation rules.

Parameters:

token (str) – The token to be normalized.
lower (bool) – Flag indicating whether token should be converted to lowercase. Defaults to False.
digit_0 (bool) – Flag indicating whether digits should be replaced with '0' during normalization. Defaults to False.

Returns:

The normalized token.

Return type:

str

size() → int

Get the size of the vocabulary.

Returns:: The number of words in the vocabulary.
Return type:: int

unk_token = '#UNK#'

t_res.utils.REL.vocabulary module

`t_res.utils.REL.vocabulary` module