AkitoP
/

whisper-large-v3-japense-phone_accent

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper-large-v3-japense-phone_accent / README.md

AkitoP's picture

Update README.md

9623385 verified 11 days ago

|

history blame contribute delete

1.34 kB

	---
	datasets:
	- japanese-asr/ja_asr.jsut_basic5000
	- litagin/Galgame_Speech_ASR_16kHz
	language:
	- ja
	metrics:
	- cer
	base_model:
	- openai/whisper-large-v3-turbo
	library_name: transformers
	---

	# Whisper Large V3 Japanese Phone Accent

	This is a Whisper model designed to transcribe Japanese speech into Katakana with pitch accent annotations. The model is built upon the whisper-large-v3-turbo and has been fine-tuned using a subset (1/20) of the Galgame-Speech dataset, as well as the jsut-5000 dataset.

	## Training Data:
	- Stage 1: Audio from the Galgame-Speech dataset was used. The text was converted into Katakana sequences with pitch accent annotations using pyopenjtalk.
	- Stage 2: JSUT-5000 dataset, using its original training set with pitch accent annotations. The data was split into 90% for training and 10% for evaluation.

	## Evaluation Results:
	- The model achieved a CER (Character Error Rate) of approximately 4% on the JSUT-5000 test set, which is an improvement over the 7% CER of pyopenjtalk.
	- Training only with Stage 1 resulted in a CER of 13%, with errors including specific misreadings and misclassification between on'yomi (音読) and kun'yomi (訓読) readings. This was improved in Stage 2.

	We are currently seeking Japanese pitch accent annotated datasets. If you have such data, please reach out!