metadata

title: 🎧AudioGen🔊 - 💾Live Multiplayer🎼
emoji: 🎙️🔊🎧
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false

Create a summary of what this code can do as a markdown outline and table. In the table feature a glossary with meanings and definitions for some of the functions and operations in the app. Have one outline specifcally for describing the functions, inputs and outputs.

Stable Audio Multiplayer Live App

App Features

Generate audio using text prompts
Customize audio generation parameters
Duration
Number of diffusion steps
Sampler type
CFG scale
Sigma min and max values
Share generated audio with the community
View and listen to audio generated by other users
Load more community-generated audio on demand

Code Structure

Import necessary libraries
Define constants and settings
Load the pre-trained model
Define the generate_audio function

Set up text and timing conditioning
Generate stereo audio
Process and save the generated audio

Define utility functions

list_all_outputs: List all generated audio files
increase_list_size: Increase the number of displayed community-generated audio files

Create the Gradio interface

Set up the input components (text prompt, parameters)
Display the generated audio output
Show community-generated audio
Provide examples for users to try

Load the model and launch the app

Functions, Inputs, and Outputs

load_model

Purpose: Load the pre-trained model and configuration
Inputs: None
Outputs: model (loaded model), model_config (model configuration)

generate_audio

Purpose: Generate audio based on the provided text prompt and parameters
Inputs:
- prompt (text prompt)
- sampler_type_dropdown (selected sampler type)
- seconds_total (duration in seconds)
- steps (number of diffusion steps)
- cfg_scale (CFG scale value)
- sigma_min_slider (sigma min value)
- sigma_max_slider (sigma max value)
Outputs: unique_filename (path to the generated audio file)

list_all_outputs

Purpose: List all generated audio files and update the community-generated audio display
Inputs: generation_history (comma-separated list of previously displayed audio files)
Outputs: updated_history (updated comma-separated list of audio files), gr.update(visible=True) (update the visibility of the community-generated audio section)

increase_list_size

Purpose: Increase the number of displayed community-generated audio files
Inputs: list_size (current number of displayed audio files)
Outputs: list_size+PAGE_SIZE (increased number of displayed audio files)

Glossary

Term	Definition
Diffusion Model	A generative model that learns to denoise data by reversing a gradual noising process
Sampler Type	The algorithm used to generate audio samples from the diffusion model
CFG Scale	Classifier-Free Guidance scale, controls the influence of the text prompt on the generated audio
Sigma	Noise level values used in the diffusion process, determining the amount of noise added or removed
Gradio	A Python library for building web-based interfaces for machine learning models
Einops	A library for flexible and readable tensor operations, used for rearranging the generated audio
Torchaudio	A PyTorch library for working with audio data, used for saving the generated audio to a file

The code achieves this functionality through the following functions:

generate_audio function:

This function is responsible for generating the audio based on the provided prompt and parameters. It saves the generated audio file with a unique filename in the specified directory (/data/output_{random_uuid}.wav). It also saves the corresponding prompt in a text file with the same unique filename (/data/output_{random_uuid}.txt).

list_all_outputs function:

This function retrieves the list of all generated audio files from the specified directory (FILE_DIR_PATH). It sorts the audio files based on their modification time in descending order, so the most recent files appear first. It updates the generation_history by appending the new audio files to the existing history list. It returns the updated generation_history and updates the visibility of the community list element.

show_output_list function:

This function is a rendering function that displays the community generations. It takes the generation_history and list_size as inputs. It retrieves the latest list_size number of audio files from the generation_history. For each audio file, it reads the corresponding prompt from the associated text file. It creates a group element for each audio file and displays the prompt as a markdown heading and the audio player.

Now, let's go through the relevant lines of code for saving and displaying the community history:

In the generate_audio function:

The code generates unique filenames for the audio file and the corresponding prompt text file using a random UUID. It saves the generated audio using torchaudio.save(unique_filename, output, sample_rate). It saves the prompt in a text file using with open(unique_textfile, "w") as file: file.write(prompt).

In the list_all_outputs function:

It retrieves the list of audio files from the specified directory using os.listdir(directory_path). It filters the files to include only .wav files and sorts them based on modification time using wav_files.sort(key=lambda x: os.path.getmtime(os.path.join(directory_path, x)), reverse=True). It updates the generation_history by appending the new audio files to the existing history list using updated_history = updated_files + history_list. It returns the updated generation_history as a comma-separated string using ','.join(updated_history).

In the show_output_list function:

It retrieves the latest list_size number of audio files from the generation_history using history_list_latest = history_list[:list_size]. For each audio file, it constructs the corresponding prompt text file path using generation_prompt_file = generation.replace('.wav', '.txt'). It reads the prompt from the text file using with open(generation_prompt_file, 'r') as file: generation_prompt = file.read(). It creates a group element using with gr.Group(): and displays the prompt as a markdown heading using gr.Markdown(value=f"### {generation_prompt}") and the audio player using gr.Audio(value=generation).

The code uses the demo.load method to periodically call the list_all_outputs function every 2 seconds, which updates the generation_history and the community list element. This ensures that the history of saved audio files is continuously updated and displayed to everyone using the space. CopyRetryAWGenerate ten emojis to use for UI in this example which deals with sound audio recording and sound effects generation along with file save memory EditHere are ten emojis that could be used for the UI in this example, which deals with sound audio recording, sound effects generation, and file save memory:

🎙️ (Microphone): Represents audio recording or input. 🔊 (Speaker): Represents audio playback or output. 🎧 (Headphones): Represents listening to audio or using headphones. 🎼 (Musical Score): Represents music or sound composition. 🔍 (Magnifying Glass): Represents searching or exploring sound effects. ⚙️ (Gear): Represents settings or configuration options for audio generation. 💾 (Floppy Disk): Represents saving or storing audio files. 📂 (Folder): Represents the directory or location where audio files are saved. ⏱️ (Stopwatch): Represents the duration or timing of audio recordings. 🎛️ (Slider): Represents adjusting parameters or settings for sound effects generation.

These emojis can be used to enhance the visual representation and intuitive understanding of various elements and actions within the UI related to audio recording, sound effects generation, and file management.