t_res.utils.REL.vocabulary
module
- class t_res.utils.REL.vocabulary.Vocabulary
A class representing a vocabulary object used for storing references to embeddings.
Note
Credit:
The code for this class and its methods is taken from the REL: Radboud Entity Linker Github repository: Copyright (c) 2020 Johannes Michael van Hulst. See the permission notice. See the original script for more information.
Reference: @inproceedings{vanHulst:2020:REL, author = {van Hulst, Johannes M. and Hasibi, Faegheh and Dercksen, Koen and Balog, Krisztian and de Vries, Arjen P.}, title = {REL: An Entity Linker Standing on the Shoulders of Giants}, booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, series = {SIGIR '20}, year = {2020}, publisher = {ACM} }
- add_to_vocab(token: str) None
Add the given token to the vocabulary.
- Parameters:
token (str) – The token to be added to the vocabulary.
- Returns:
None.
- static normalize(token: str, lower: Optional[bool] = False, digit_0: Optional[bool] = False) str
Normalise the given token based on the specified normalisation rules.
- Parameters:
- Returns:
The normalized token.
- Return type:
- size() int
Get the size of the vocabulary.
- Returns:
The number of words in the vocabulary.
- Return type:
- unk_token = '#UNK#'