Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!

#5
by Cach - opened

Could you let me know when the bfloat16 model will be uploaded? I can't run the float32 model!

Yes, we would like to build a bfloat16 compatible version. In the meantime you can run this model with torch.autocast to save some memory:

with torch.autocast("cuda", enabled=True, dtype=autocast_precision):

We did our evaluations in that setting. (float32 weights with autocast enabled)

The current code does not support bfloat16 inference directly, but you can try with torch.autocast.

import torch
with torch.autocast("cuda", enabled=True, dtype=torch.bfloat16):
    output = model.generate_from_batch(
        inputs,
        GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
        tokenizer=processor.tokenizer
    )

Note that the weights will still be in float32.

Will a bfloat16 version be released some point in the future though?

Sign up or log in to comment