sandy1990418 commited on
Commit
2243492
1 Parent(s): ccb3a7f

doc: add some description

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -6,4 +6,52 @@ language:
6
  base_model:
7
  - openai/whisper-large-v3-turbo
8
  pipeline_tag: automatic-speech-recognition
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  base_model:
7
  - openai/whisper-large-v3-turbo
8
  pipeline_tag: automatic-speech-recognition
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ This model card describes a fine-tuned version of the Whisper-large-v3-turbo model, optimized for Mandarin automatic speech recognition (ASR). The model was fine-tuned on the Common Voice 13.0 dataset using PEFT with LoRA to ensure efficient training while maintaining the performance of the original model.. It achieves the following results on the evaluation set (Common Voice 13.0 dataset / test):
16
+
17
+ Wer without fine-tune: 77.08
18
+ Wer after fine-tune: 44.93
19
+
20
+
21
+ ## Uses
22
+
23
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
24
+ ```bash
25
+ import torch
26
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
27
+ from datasets import load_dataset
28
+
29
+
30
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
31
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
32
+
33
+ model_id = "sandy1990418/whisper-large-v3-turbo-chinese"
34
+
35
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
36
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
37
+ )
38
+ model.to(device)
39
+
40
+ processor = AutoProcessor.from_pretrained(model_id)
41
+
42
+ pipe = pipeline(
43
+ "automatic-speech-recognition",
44
+ model=model,
45
+ tokenizer=processor.tokenizer,
46
+ feature_extractor=processor.feature_extractor,
47
+ torch_dtype=torch_dtype,
48
+ device=device,
49
+ )
50
+
51
+ dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
52
+ sample = dataset[0]["audio"]
53
+
54
+ result = pipe(sample)
55
+ print(result["text"])
56
+
57
+ ```