FP16 vs FP32

#127

by Taylor658 - opened Jun 8

Jun 8

What are the memory usage, performance differences, and accuracy trade-offs between FP16 and FP32 precision in Whisper-large-v3 on typical GPU like the NVIDIA A100?

faizsameerahmed96

Jun 9

You can get a rough idea of the memory usage to run any model using this formula

Approx memory usage = No of parameters * byte precision * 0.1

In theory, the memory would be a bit higher (sequence length, loading libraries etc)

When we say FP16, this equates to 2 bytes per parameter, Whisper Large v3 has ~1.6B params.

Therefore the total memory usage for params would be over 3.2GB.

Taylor658

Jun 10

Thanks for the feedback and formula

Taylor658 changed discussion status to closed Jun 10

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment