Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!
#5
by
Cach
- opened
Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!
Yes, we would like to build a bfloat16 compatible version. In the meantime you can run this model with torch.autocast to save some memory:
with torch.autocast("cuda", enabled=True, dtype=autocast_precision):
We did our evaluations in that setting. (float32 weights with autocast enabled)
The current code does not support bfloat16 inference directly, but you can try with torch.autocast
.
import torch
with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):
output = model.generate_from_batch(
inputs,
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
tokenizer=processor.tokenizer
)
Note that the weights will still be in float32.
Will a bfloat16 version be released some point in the future though?