awacke1's picture
Update README.md
7b89c1e verified

A newer version of the Gradio SDK is available: 4.44.1

Upgrade
metadata
title: 🎧AudioGen🔊 - 💾Live Multiplayer🎼
emoji: 🎙️🔊🎧
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false

Create a summary of what this code can do as a markdown outline and table. In the table feature a glossary with meanings and definitions for some of the functions and operations in the app. Have one outline specifcally for describing the functions, inputs and outputs.

Stable Audio Multiplayer Live App

App Features

  • Generate audio using text prompts
  • Customize audio generation parameters
  • Duration
  • Number of diffusion steps
  • Sampler type
  • CFG scale
  • Sigma min and max values
  • Share generated audio with the community
  • View and listen to audio generated by other users
  • Load more community-generated audio on demand

Code Structure

  1. Import necessary libraries
  2. Define constants and settings
  3. Load the pre-trained model
  4. Define the generate_audio function
  • Set up text and timing conditioning
  • Generate stereo audio
  • Process and save the generated audio
  1. Define utility functions
  • list_all_outputs: List all generated audio files
  • increase_list_size: Increase the number of displayed community-generated audio files
  1. Create the Gradio interface
  • Set up the input components (text prompt, parameters)
  • Display the generated audio output
  • Show community-generated audio
  • Provide examples for users to try
  1. Load the model and launch the app

Functions, Inputs, and Outputs

  1. load_model
  • Purpose: Load the pre-trained model and configuration
  • Inputs: None
  • Outputs: model (loaded model), model_config (model configuration)
  1. generate_audio
  • Purpose: Generate audio based on the provided text prompt and parameters
  • Inputs:
    • prompt (text prompt)
    • sampler_type_dropdown (selected sampler type)
    • seconds_total (duration in seconds)
    • steps (number of diffusion steps)
    • cfg_scale (CFG scale value)
    • sigma_min_slider (sigma min value)
    • sigma_max_slider (sigma max value)
  • Outputs: unique_filename (path to the generated audio file)
  1. list_all_outputs
  • Purpose: List all generated audio files and update the community-generated audio display
  • Inputs: generation_history (comma-separated list of previously displayed audio files)
  • Outputs: updated_history (updated comma-separated list of audio files), gr.update(visible=True) (update the visibility of the community-generated audio section)
  1. increase_list_size
  • Purpose: Increase the number of displayed community-generated audio files
  • Inputs: list_size (current number of displayed audio files)
  • Outputs: list_size+PAGE_SIZE (increased number of displayed audio files)

Glossary

Term Definition
Diffusion Model A generative model that learns to denoise data by reversing a gradual noising process
Sampler Type The algorithm used to generate audio samples from the diffusion model
CFG Scale Classifier-Free Guidance scale, controls the influence of the text prompt on the generated audio
Sigma Noise level values used in the diffusion process, determining the amount of noise added or removed
Gradio A Python library for building web-based interfaces for machine learning models
Einops A library for flexible and readable tensor operations, used for rearranging the generated audio
Torchaudio A PyTorch library for working with audio data, used for saving the generated audio to a file

The code achieves this functionality through the following functions:

generate_audio function:

This function is responsible for generating the audio based on the provided prompt and parameters. It saves the generated audio file with a unique filename in the specified directory (/data/output_{random_uuid}.wav). It also saves the corresponding prompt in a text file with the same unique filename (/data/output_{random_uuid}.txt).

list_all_outputs function:

This function retrieves the list of all generated audio files from the specified directory (FILE_DIR_PATH). It sorts the audio files based on their modification time in descending order, so the most recent files appear first. It updates the generation_history by appending the new audio files to the existing history list. It returns the updated generation_history and updates the visibility of the community list element.

show_output_list function:

This function is a rendering function that displays the community generations. It takes the generation_history and list_size as inputs. It retrieves the latest list_size number of audio files from the generation_history. For each audio file, it reads the corresponding prompt from the associated text file. It creates a group element for each audio file and displays the prompt as a markdown heading and the audio player.

Now, let's go through the relevant lines of code for saving and displaying the community history:

In the generate_audio function:

The code generates unique filenames for the audio file and the corresponding prompt text file using a random UUID. It saves the generated audio using torchaudio.save(unique_filename, output, sample_rate). It saves the prompt in a text file using with open(unique_textfile, "w") as file: file.write(prompt).

In the list_all_outputs function:

It retrieves the list of audio files from the specified directory using os.listdir(directory_path). It filters the files to include only .wav files and sorts them based on modification time using wav_files.sort(key=lambda x: os.path.getmtime(os.path.join(directory_path, x)), reverse=True). It updates the generation_history by appending the new audio files to the existing history list using updated_history = updated_files + history_list. It returns the updated generation_history as a comma-separated string using ','.join(updated_history).

In the show_output_list function:

It retrieves the latest list_size number of audio files from the generation_history using history_list_latest = history_list[:list_size]. For each audio file, it constructs the corresponding prompt text file path using generation_prompt_file = generation.replace('.wav', '.txt'). It reads the prompt from the text file using with open(generation_prompt_file, 'r') as file: generation_prompt = file.read(). It creates a group element using with gr.Group(): and displays the prompt as a markdown heading using gr.Markdown(value=f"### {generation_prompt}") and the audio player using gr.Audio(value=generation).

The code uses the demo.load method to periodically call the list_all_outputs function every 2 seconds, which updates the generation_history and the community list element. This ensures that the history of saved audio files is continuously updated and displayed to everyone using the space. CopyRetryAWGenerate ten emojis to use for UI in this example which deals with sound audio recording and sound effects generation along with file save memory EditHere are ten emojis that could be used for the UI in this example, which deals with sound audio recording, sound effects generation, and file save memory:

🎙️ (Microphone): Represents audio recording or input. 🔊 (Speaker): Represents audio playback or output. 🎧 (Headphones): Represents listening to audio or using headphones. 🎼 (Musical Score): Represents music or sound composition. 🔍 (Magnifying Glass): Represents searching or exploring sound effects. ⚙️ (Gear): Represents settings or configuration options for audio generation. 💾 (Floppy Disk): Represents saving or storing audio files. 📂 (Folder): Represents the directory or location where audio files are saved. ⏱️ (Stopwatch): Represents the duration or timing of audio recordings. 🎛️ (Slider): Represents adjusting parameters or settings for sound effects generation.

These emojis can be used to enhance the visual representation and intuitive understanding of various elements and actions within the UI related to audio recording, sound effects generation, and file management.