OpenSound/EzAudio · pretty neat!

Aibecool

9 days ago

•

edited 8 days ago

"big band jazz music, loud drums, fast pace, energetic, studio recording"

kind-of wonky with music

"synth drums, funky futuristic music"

Nevermind then.
"strings instrument, homemade, wonky sounding"

"Hawaiian rock, with intense drums"

OpenSound

Owner 8 days ago

•

edited 8 days ago

Given that the model hasn't been explicitly trained on music data, I initially expected its music generation quality to be lacking.
However, these examples sound surprisingly good and intriguing!
We're currently developing a text-to-music model with specialized music training, which we believe will further improve its music generation capabilities.

Aibecool

8 days ago

•

edited 8 days ago

I think it would definitely benefit from being bigger, maybe you should make a general model with 1.5b parameters "EzAudio-MEGA", trained on many diffrent types of media for a all in one model, it could do TTS, music with lyrics, so much potential, but then again, where would the data come from?

OpenSound

Owner 8 days ago

I think it would definitely benefit from being bigger, maybe you should make a general model with 1.5b parameters "EzAudio-MEGA", trained on many diffrent types of media for a all in one model, it could do TTS, music with lyrics, so much potential, but then again, where would the data come from?

We'll definitely explore that in future work. Yes, scaling up is always an option. Sound effects and TTS are more feasible at the moment, but the biggest hurdle for music, especially songs with lyrics, is the lack of large, open-source, license-free, and labeled datasets.

Aibecool

8 days ago

•

edited 7 days ago

yes, I understand.