distil-whisper/distil-medium.en

Mar 18

copied from your example. This just raise following error:
AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'

brianjking

Mar 22

Seeing the same issue.

sanchit-gandhi

Whisper Distillation org Mar 25

Great catch - fixed in https://huggingface.co/distil-whisper/distil-medium.en/commit/26f298e3a65ea076cbe4498ff70b84d33a8cca32

sanchit-gandhi changed discussion status to closed Mar 25

Owos

Mar 27

Great catch - fixed in https://huggingface.co/distil-whisper/distil-medium.en/commit/26f298e3a65ea076cbe4498ff70b84d33a8cca32

this does not solve the problem during finetuning @sanchit-gandhi
I still get the same error whenever my code wants to enter the eval loop during finetuning

thoool

Jun 7

I am facing the same issue when running the evaluation.

reach-vb

Whisper Distillation org Jun 7

Hi @Owos & @thoool - This seems to work for me, here's a repro: https://github.com/Vaibhavs10/scratchpad/blob/main/distil_whisper_medium_repro.ipynb

Can you try upgrading the version of transformers or please share a reproducible snippet!

reach-vb changed discussion status to open Jun 7

sanchit-gandhi

Whisper Distillation org Jun 7

I also made a bunch of language detection fixes to the Whisper fine-tuning blog post and Colab - could you try using the latest versions to ensure you receive the bug fixes?

Let me know if the issue persists!

thoool

Jun 7

I just upgraded transformers from 4.38.2 to 4.41.2, however, the error persists.

My setup is somewhat different because I have been trying to fine-tune a German version of Distil-Whisper, like so:

accelerate launch run_distillation.py   
--model_name_or_path "./distil-large-v3-init"   
--teacher_model_name_or_path "openai/whisper-large-v3"   
--train_dataset_name "mozilla-foundation/common_voice_17_0"   
--train_dataset_config_name "de"  
--train_split_name "train"   
--text_column_name "sentence"   
--eval_dataset_name "mozilla-foundation/common_voice_17_0"   
--eval_dataset_config_name "de"  
--eval_split_name "validation"   
--eval_text_column_name "sentence"   
--eval_steps 1_000   
--save_steps 1_000   
--warmup_steps 100   
--learning_rate 0.0001   
--lr_scheduler_type "constant_with_warmup"   
--timestamp_probability 0.2   
--condition_on_prev_probability 0.2   
--language "de"   
--task "transcribe"   
--logging_steps 25   
--save_total_limit 3   
--max_steps 100_000   
--wer_threshold 20   
--per_device_train_batch_size 32   
--per_device_eval_batch_size 32   
--dataloader_num_workers 2   
--preprocessing_num_workers 2   
--ddp_timeout 7200   
--dtype "bfloat16"   
--attn_implementation "sdpa"   
--output_dir "./"   
--do_train   
--do_eval   
--gradient_checkpointing   
--overwrite_output_dir   
--predict_with_generate   
--freeze_encoder   
--freeze_embed_positions 
--use_pseudo_labels=False

For the evaluation, I am now inside my checkpoint folder when running the following command:

python run_eval.py   
--model_name_or_path "./"   
--dataset_name "mozilla-foundation/common_voice_17_0"   
--dataset_config_name "de"   
--dataset_split_name "test"   
--text_column_name "sentence"   
--batch_size 16   
--dtype "bfloat16"   
--generation_max_length 256   
--language "de"   
--attn_implementation "sdpa"   
--streaming

sanchit-gandhi

Whisper Distillation org Jun 13

Hey @thoool - thanks for reporting! Could you provide the full stack trace for the error that you're getting with these scripts? It would be helpful to see where the error occurs in the Distil-Whisper training code (cc @eustlb )

thoool

Jun 17

Sure

8 Traceback (most recent call last):
 9   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
10     main()
11   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 572, in main
12     language = language_to_id(data_args.language, model.generation_config) if data_args.language else None
13   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 378, in language_to_id
14     if language in generation_config.lang_to_id.keys():
15 AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'

sanchit-gandhi

Whisper Distillation org Jun 18

Are you passing the language argument to run_eval.py when evaluating an English only checkpoint? Note that the language argument should only be passed for multilingual checkpoints. I've opened a PR to throw a better warning here: https://github.com/huggingface/distil-whisper/pull/139

Otherwise, you're likely using a model with an outdated generation config for distillation! Could you update the generation config to match that of the original pre-trained model?

from transformers import GenerationConfig, AutoConfig

# fill me with the hub model id of the checkpoint you're distilling
MODEL_NAME = "sanchit-gandhi/whisper-small-hi"
vocab_size = AutoConfig.from_pretrained(MODEL_NAME).vocab_size

if vocab_size == 51864:
    original_model = "openai/whisper-tiny.en"    
elif vocab_size == 51865:
    original_model = "openai/whisper-tiny"
else:
    original_model = "openai/whisper-large-v3"

# load updated generation config
generation_config = GenerationConfig.from_pretrained(original_model)
# push updated generation config to the Hub
generation_config.push_to_hub(MODEL_NAME)

thoool

Jun 20

I am not quite sure if I understand this correctly.

The model that I used as a teacher model is --teacher_model_name_or_path "openai/whisper-large-v3", and I set --language "de" while using --train_dataset_name "mozilla-foundation/common_voice_17_0". So I end up with a German distilled version of whisper-large-v3 which l is stored locally.

When executing the run_eval.py file, I indeed pass --language "de" just like I did during training. Do you mean I don't have to set language as I now have a German version and no longer a multilingual version of Whisper?

FWIW:

python run_eval.py   
 --model_name_or_path "./"  
 --dataset_name "mozilla-foundation/common_voice_17_0" 
 --dataset_config_name "de"  
 --dataset_split_name "test"  
 --text_column_name "sentence"  
 --batch_size 16  
 --dtype "bfloat16"  
 --generation_max_length 256  
 --attn_implementation "sdpa"   
 --streaming
 --return_timestamps False

seems to be circumventing the problem. That being said, I now face this error:

Start benchmarking common_voice_17_0/test...                                                                                                                                                                                                                                                                                                                                                                        
Reading metadata...: 16183it [00:00, 41952.06it/s]                                                                                                                                                                                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:537: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(...: 1it [00:00,  3.35it/s]
Samples: 16183it [13:45, 19.60it/s]
Datasets:   0%|                                                                                                                                                                                                                                                                                                                                                                               | 0/1 [13:45<?, ?it/s]
Traceback (most recent call last):
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
    main()
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in main
    norm_transcriptions = [normalizer(pred) for pred in transcriptions]
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in <listcomp>
    norm_transcriptions = [normalizer(pred) for pred in transcriptions]
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 587, in __call__
    s = self.standardize_spellings(s)
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in __call__
    return " ".join(self.mapping.get(word, word) for word in s.split())
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in <genexpr>
    return " ".join(self.mapping.get(word, word) for word in s.split())
AttributeError: 'NoneType' object has no attribute 'get'

sanchit-gandhi

Whisper Distillation org Jun 21

Ah I see what's happening! The checkpoint you're evaluating is an intermediate checkpoint (i.e. one saved partway during training with accelerator.save_state). This saves the model weights to checkpoint-35000-epoch-1, but not the config, tokenizer, feature extractor or generation config.

To remedy this, could you copy the corresponding files into this checkpoint dir?

from transformers import GenerationConfig , WhisperConfig, WhisperProcessor

BASE_DIR = "/home/operation/whisper_finetune/distil-whisper/training/"
CHECKPOINT = "checkpoint-35000-epoch-1"

config = WhisperConfig.from_pretrained(BASE_DIR)
processor = WhisperProcessor.from_pretrained(BASE_DIR)
generation_config = GenerationConfig.from_pretrained(BASE_DIR)

config.save_pretrained(BASE_DIR + CHECKPOINT)
processor.save_pretrained(BASE_DIR + CHECKPOINT)
generation_config.save_pretrained(BASE_DIR + CHECKPOINT)

You should then be able to run evaluation using the scripts you shared above

What do you think about updating the distillation script to save the config/processor/generation config during intermediate saves @eustlb ? Would be useful for evaluating intermediate checkpoints.

thoool

27 days ago

That worked just fine, thanks @sanchit-gandhi

eustlb

Whisper Distillation org 25 days ago

Agree there @sanchit-gandhi ! I'll update the distillation script.

eustlb

Whisper Distillation org 24 days ago

Done in this PR! :)

distil-whisper
/

distil-medium.en

Just can't run!