Edit model card

Whisper Medium ATC short

This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.

It was created as a product of bachelor's thesis at Faculty of Information Technology, Brno University of Technology.

Model description

Usage

import torch
from transformers import pipeline

audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-short", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])

Dataset

Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.

Output format

The model was learned to abbreviate some information, especially numbers and callsigns. Transcription format of a recording is as follows:

Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Transcription: OKABC dráha 20C pro přistání volná vítr 010 stupňů 5 uzlů

Note: See also model BUT-FIT/whisper-ATC-czech-full, which does not abbreviate any values and transcribes recordings word by word.

Results

The model reached total WER of 20.0 % on unseen Czech and English LKKU recordings. 34.0 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.

WER of callsings in LKKU recordings was evaluated to be 7.8 %, while on LKPR and LKTB dataset the model reached 11.6 %.

Training hyperparameters

  • learning_rate: 3e-5
  • per_device_train_batch_size: 2
  • gradient_accumulation_steps: 8
  • warmup_ratio: 0.12
  • fp16: True
  • gradient_checkpointing: True
  • evaluation_strategy: "epoch"
  • save_strategy: "epoch"
  • load_best_model_at_end: True
  • metric_for_best_model: "wer"
  • num_train_epochs: 45

Contact

For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).

Downloads last month
5
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.