Why there are not only one token_Id for some tokens?

#6
by Sev777 - opened
# sample code to repro the bug
>>> tokenizer = LlamaTokenizer.from_pretrained(‘huggingface/open_llama_7b’)
>>> tokenizer.encode('London')
[1, 2516]
>>> tokenizer.decode(2516)
'London'
>>> tokenizer.decode(20719)
'London'
>>>  tokenizer.decode(2516)==tokenizer.decode(20719)
True

Sign up or log in to comment