Text-to-Audio AI Tools | AI Toolbox

STAR

STAR can generate audio from speech input while capturing important sounds and scene details.

19.10.25 · Project Page · Code · Text-to-Audio

TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.

31.12.24 · Project Page · Code · Demo · Text-to-Audio

MMAudio

MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.

23.12.24 · Project Page · Code · Video-to-Audio · Text-to-Audio

STA-V2A

STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.

20.08.24 · Project Page · Code · Text-to-Audio

Stable Audio Open

Stable Audio Open can generate up to 47 seconds of stereo audio at 44.1kHz from text prompts. It uses a transformer-based diffusion model for high-quality sound, making it useful for artists and researchers.

10.07.24 · Project Page · Code · Weights · Text-to-Audio

PicoAudio

PicoAudio is a temporal controlled audio generation framework. The model is able to generate audio with precise timestamp and occurrence frequency control.

03.07.24 · Project Page · Code · Text-to-Audio · Audio Editing

FoleyCrafter

FoleyCrafter can generate high-quality sound effects for videos! Results aim to be semantically relevant and temporally synchronized with a video. It also supports text prompts to better control the video-to-audio generation.

01.07.24 · Project Page · Code · Demo · Text-to-Audio

Auffusion

Auffusion is a Text-to-Audio system that is able to generate audio from natural language prompts. The model is able to control various aspects of the audio, such as acoustic environment, material, pitch, and temporal order. It can also generate audio based on labels or be combined with an LLM model to generate descriptive audio prompts.

02.01.24 · Project Page · Code · Text-to-Audio · Audio Inpainting

AudioLDM 2

AudioLDM 2 can generate high-quality audio in different forms, like text-to-audio and image-to-audio. It uses a smart training method to achieve top performance on important tests.

10.08.23 · Code · Demo · Text-to-Audio · Text-to-Music · Text-to-Speech

WavJourney

WavJourney is a system that uses large language models to generate audio content with storylines encompassing speech, music, and sound effects guided from text instructions. The demo results, while not perfect, sound great.

26.07.23 · Project Page · Code · Text-to-Audio