kodoqmc commited on
Commit
bd3dad2
โ€ข
1 Parent(s): 422a7b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -104
README.md CHANGED
@@ -1,104 +1,104 @@
1
- ---
2
- license: other
3
- license_name: coqui-public-model-license
4
- license_link: https://coqui.ai/cpml
5
- library_name: coqui
6
- pipeline_tag: text-to-speech
7
- widget:
8
- - text: "Once when I was six years old I saw a magnificent picture"
9
- ---
10
-
11
- # โ“TTS_v2 - The San-Ti Fine-Tuned Model
12
-
13
- This repository hosts a fine-tuned version of the โ“TTS model, utilizing 4 minutes of unique voice lines from The San-Ti, The voice lines were sourced from the clip of 3 Body Problem on Youtube, can be found here:
14
- [The San-Ti Explain how they Stop Science on Earth | 3 Body Problem | Netflix](https://www.youtube.com/watch?v=caxiX38DK68)
15
-
16
- ![The San-Ti: Illustration](thesanti.jpg)
17
-
18
- Listen to a sample of the โ“TTS_v2 - The San-Ti Fine-Tuned Model:
19
-
20
- <audio controls>
21
- <source src="https://huggingface.co/kodoqmc/XTTS-v2_San-Ti/resolve/main/Generated.wav" type="audio/wav">
22
- Your browser does not support the audio element.
23
- </audio>
24
-
25
- Here's a The San-Ti mp3 voice line clip from the training data:
26
-
27
- <audio controls>
28
- <source src="https://huggingface.co/kodoqmc/XTTS-v2_San-Ti/resolve/main/reference.wav" type="audio/wav">
29
- Your browser does not support the audio element.
30
- </audio>
31
-
32
- ## Features
33
- - ๐ŸŽ™๏ธ **Voice Cloning**: Realistic voice cloning with just a short audio clip.
34
- - ๐ŸŒ **Multi-Lingual Support**: Generates speech in 17 different languages while maintaining The San-Ti's voice.
35
- - ๐Ÿ˜ƒ **Emotion & Style Transfer**: Captures the emotional tone and style of the original voice.
36
- - ๐Ÿ”„ **Cross-Language Cloning**: Maintains the unique voice characteristics across different languages.
37
- - ๐ŸŽง **High-Quality Audio**: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.
38
-
39
- ## Supported Languages
40
- The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).
41
-
42
- ## Usage in Roll Cage
43
- ๐Ÿค–๐Ÿ’ฌ Boost your AI experience with this Ollama add-on! Enjoy real-time audio ๐ŸŽ™๏ธ and text ๐Ÿ” chats, LaTeX rendering ๐Ÿ“œ, agent automations โš™๏ธ, workflows ๐Ÿ”„, text-to-image ๐Ÿ“โžก๏ธ๐Ÿ–ผ๏ธ, image-to-text ๐Ÿ–ผ๏ธโžก๏ธ๐Ÿ”ค, image-to-video ๐Ÿ–ผ๏ธโžก๏ธ๐ŸŽฅ transformations. Fine-tune text ๐Ÿ“, voice ๐Ÿ—ฃ๏ธ, and image ๐Ÿ–ผ๏ธ gens. Includes Windows macro controls ๐Ÿ–ฅ๏ธ and DuckDuckGo search.
44
-
45
- [ollama_agent_roll_cage (OARC)](https://github.com/Leoleojames1/ollama_agent_roll_cage) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.
46
-
47
- ## CoquiTTS and Resources
48
- - ๐Ÿธ๐Ÿ’ฌ **CoquiTTS**: [Coqui TTS on GitHub](https://github.com/coqui-ai/TTS)
49
- - ๐Ÿ“š **Documentation**: [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
50
- - ๐Ÿ‘ฉโ€๐Ÿ’ป **Questions**: [GitHub Discussions](https://github.com/coqui-ai/TTS/discussions)
51
- - ๐Ÿ—ฏ **Community**: [Discord](https://discord.gg/5eXr5seRrv)
52
-
53
- ## License
54
- This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the origin story of CPML [here](https://coqui.ai/blog/tts/cpml).
55
-
56
- ## Contact
57
- Join our ๐ŸธCommunity on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, email us at info@coqui.ai.
58
-
59
- Using ๐ŸธTTS API:
60
-
61
- ```python
62
- from TTS.api import TTS
63
-
64
- tts = TTS(model_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/",
65
- config_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/config.json", progress_bar=False, gpu=True).to(self.device)
66
-
67
- # generate speech by cloning a voice using default settings
68
- tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
69
- file_path="output.wav",
70
- speaker_wav="/path/to/target/speaker.wav",
71
- language="en")
72
-
73
- ```
74
-
75
- Using ๐ŸธTTS Command line:
76
-
77
- ```console
78
- tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
79
- --text "Bugรผn okula gitmek istemiyorum." \
80
- --speaker_wav /path/to/target/speaker.wav \
81
- --language_idx tr \
82
- --use_cuda true
83
- ```
84
-
85
- Using the model directly:
86
-
87
- ```python
88
- from TTS.tts.configs.xtts_config import XttsConfig
89
- from TTS.tts.models.xtts import Xtts
90
-
91
- config = XttsConfig()
92
- config.load_json("/path/to/xtts/config.json")
93
- model = Xtts.init_from_config(config)
94
- model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
95
- model.cuda()
96
-
97
- outputs = model.synthesize(
98
- "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
99
- config,
100
- speaker_wav="/data/TTS-public/_refclips/3.wav",
101
- gpt_cond_len=3,
102
- language="en",
103
- )
104
- ```
 
1
+ ---
2
+ license: other
3
+ license_name: coqui-public-model-license
4
+ license_link: https://coqui.ai/cpml
5
+ library_name: coqui
6
+ pipeline_tag: text-to-speech
7
+ widget:
8
+ - text: "Once when I was six years old I saw a magnificent picture"
9
+ ---
10
+
11
+ # โ“TTS_v2 - The San-Ti Fine-Tuned Model
12
+
13
+ This repository hosts a fine-tuned version of the โ“TTS model, utilizing 4 minutes of unique voice lines from The San-Ti, The voice lines were sourced from the clip of 3 Body Problem on Youtube, can be found here:
14
+ [The San-Ti Explain how they Stop Science on Earth | 3 Body Problem | Netflix](https://www.youtube.com/watch?v=caxiX38DK68)
15
+
16
+ ![The San-Ti: Illustration](thesanti.jpg)
17
+
18
+ Listen to a sample of the โ“TTS_v2 - The San-Ti Fine-Tuned Model:
19
+
20
+ <audio controls>
21
+ <source src="https://huggingface.co/kodoqmc/XTTS-v2_San-Ti/resolve/main/generated.wav" type="audio/wav">
22
+ Your browser does not support the audio element.
23
+ </audio>
24
+
25
+ Here's a The San-Ti mp3 voice line clip from the training data:
26
+
27
+ <audio controls>
28
+ <source src="https://huggingface.co/kodoqmc/XTTS-v2_San-Ti/resolve/main/reference.wav" type="audio/wav">
29
+ Your browser does not support the audio element.
30
+ </audio>
31
+
32
+ ## Features
33
+ - ๐ŸŽ™๏ธ **Voice Cloning**: Realistic voice cloning with just a short audio clip.
34
+ - ๐ŸŒ **Multi-Lingual Support**: Generates speech in 17 different languages while maintaining The San-Ti's voice.
35
+ - ๐Ÿ˜ƒ **Emotion & Style Transfer**: Captures the emotional tone and style of the original voice.
36
+ - ๐Ÿ”„ **Cross-Language Cloning**: Maintains the unique voice characteristics across different languages.
37
+ - ๐ŸŽง **High-Quality Audio**: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.
38
+
39
+ ## Supported Languages
40
+ The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).
41
+
42
+ ## Usage in Roll Cage
43
+ ๐Ÿค–๐Ÿ’ฌ Boost your AI experience with this Ollama add-on! Enjoy real-time audio ๐ŸŽ™๏ธ and text ๐Ÿ” chats, LaTeX rendering ๐Ÿ“œ, agent automations โš™๏ธ, workflows ๐Ÿ”„, text-to-image ๐Ÿ“โžก๏ธ๐Ÿ–ผ๏ธ, image-to-text ๐Ÿ–ผ๏ธโžก๏ธ๐Ÿ”ค, image-to-video ๐Ÿ–ผ๏ธโžก๏ธ๐ŸŽฅ transformations. Fine-tune text ๐Ÿ“, voice ๐Ÿ—ฃ๏ธ, and image ๐Ÿ–ผ๏ธ gens. Includes Windows macro controls ๐Ÿ–ฅ๏ธ and DuckDuckGo search.
44
+
45
+ [ollama_agent_roll_cage (OARC)](https://github.com/Leoleojames1/ollama_agent_roll_cage) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.
46
+
47
+ ## CoquiTTS and Resources
48
+ - ๐Ÿธ๐Ÿ’ฌ **CoquiTTS**: [Coqui TTS on GitHub](https://github.com/coqui-ai/TTS)
49
+ - ๐Ÿ“š **Documentation**: [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
50
+ - ๐Ÿ‘ฉโ€๐Ÿ’ป **Questions**: [GitHub Discussions](https://github.com/coqui-ai/TTS/discussions)
51
+ - ๐Ÿ—ฏ **Community**: [Discord](https://discord.gg/5eXr5seRrv)
52
+
53
+ ## License
54
+ This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the origin story of CPML [here](https://coqui.ai/blog/tts/cpml).
55
+
56
+ ## Contact
57
+ Join our ๐ŸธCommunity on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, email us at info@coqui.ai.
58
+
59
+ Using ๐ŸธTTS API:
60
+
61
+ ```python
62
+ from TTS.api import TTS
63
+
64
+ tts = TTS(model_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/",
65
+ config_path="D:/AI/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_PeterDrury/config.json", progress_bar=False, gpu=True).to(self.device)
66
+
67
+ # generate speech by cloning a voice using default settings
68
+ tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
69
+ file_path="output.wav",
70
+ speaker_wav="/path/to/target/speaker.wav",
71
+ language="en")
72
+
73
+ ```
74
+
75
+ Using ๐ŸธTTS Command line:
76
+
77
+ ```console
78
+ tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
79
+ --text "Bugรผn okula gitmek istemiyorum." \
80
+ --speaker_wav /path/to/target/speaker.wav \
81
+ --language_idx tr \
82
+ --use_cuda true
83
+ ```
84
+
85
+ Using the model directly:
86
+
87
+ ```python
88
+ from TTS.tts.configs.xtts_config import XttsConfig
89
+ from TTS.tts.models.xtts import Xtts
90
+
91
+ config = XttsConfig()
92
+ config.load_json("/path/to/xtts/config.json")
93
+ model = Xtts.init_from_config(config)
94
+ model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
95
+ model.cuda()
96
+
97
+ outputs = model.synthesize(
98
+ "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
99
+ config,
100
+ speaker_wav="/data/TTS-public/_refclips/3.wav",
101
+ gpt_cond_len=3,
102
+ language="en",
103
+ )
104
+ ```