FP16 vs FP32

#127
by Taylor658 - opened

What are the memory usage, performance differences, and accuracy trade-offs between FP16 and FP32 precision in Whisper-large-v3 on typical GPU like the NVIDIA A100?

You can get a rough idea of the memory usage to run any model using this formula

Approx memory usage = No of parameters * byte precision * 0.1

In theory, the memory would be a bit higher (sequence length, loading libraries etc)

When we say FP16, this equates to 2 bytes per parameter, Whisper Large v3 has ~1.6B params.

Therefore the total memory usage for params would be over 3.2GB.

Thanks for the feedback and formula

Taylor658 changed discussion status to closed

Sign up or log in to comment