I'm confusing if this Chat model use the standard CHATML template? Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>?

#2
by songkq - opened

Yi-1.5-34B-Chat-16K/config.json is not consistent with Yi-1.5-34B-Chat-16K/tokenizer_config.json.
Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>?

As shown in Yi-1.5-34B-Chat-16K/config.json:
"bos_token_id": 1,
"eos_token_id": 2,

As shown in Yi-1.5-34B-Chat-16K/tokenizer_config.json:
"bos_token": "<|startoftext|>",
"eos_token": "<|im_end|>",

"1": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}

01-ai org

My apologies for the late reply, we've updated the tokenizer.json, did that resolve your issue?
thanks!

Sign up or log in to comment