Update tokenizer_config.json

#4

The current chat_template adds an extra EOS token when add_generation_prompt=False.
Please replace it with the correct chat_template to fix this behavior.

from transformers import AutoTokenizer
message  = [{"role": "user" , "content": 'How are you?'}]
tame_tokenizer = AutoTokenizer.from_pretrained("yentinglin/Llama-3-Taiwan-70B-Instruct")
tame_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False)

You can see an extra EOS token in the output :

 <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|eot_id|>
minyichen changed pull request title from Upload tokenizer_config.json to Update tokenizer_config.json
yentinglin changed pull request status to merged

Sign up or log in to comment