legekka's picture
Update README.md
f90b4e5
|
raw
history blame
971 Bytes
metadata
license: cc-by-4.0
datasets:
  - KTH/hungarian-single-speaker-tts
language:
  - hu
tags:
  - text-to-speech
  - audio

This vits model was trained on the KTH/hungarian-single-speaker-tts dataset.

CSS10 Hungarian: Single Speaker Speech Dataset

The corpus consists of a single speaker, with 4515 segments extracted from this single LibriVox audiobook. It consists about 10 hours of audio data.

Training

The model was trained on a single RTX 3090 GPU. The training took about 1 day for the first checkpoint (Step 93000). Based on the quality of the preview model, we are aiming for 250K Steps.

Usage

The model can be used with JayWalnut's git repo, but you have to modify the text/cleaners.py file to contain our hungarian_cleaners method. We provided the necessary files in our repo to do so.