Text-to-Audio
Free text-to-audio AI tools for converting written text into engaging audio content, perfect for podcasts, videos, and multimedia projects.
STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.
Stable Audio Open can generate up to 47 seconds of stereo audio at 44.1kHz from text prompts. It uses a transformer-based diffusion model for high-quality sound, making it useful for artists and researchers.
PicoAudio is a temporal controlled audio generation framework. The model is able to generate audio with precise timestamp and occurrence frequency control.
FoleyCrafter can generate high-quality sound effects for videos! Results aim to be semantically relevant and temporally synchronized with a video. It also supports text prompts to better control the video-to-audio generation.
Auffusion is a Text-to-Audio system that is able to generate audio from natural language prompts. The model is able to control various aspects of the audio, such as acoustic environment, material, pitch, and temporal order. It can also generate audio based on labels or be combined with an LLM model to generate descriptive audio prompts.
AudioLDM 2 can generate high-quality audio in different forms, like text-to-audio and image-to-audio. It uses a smart training method to achieve top performance on important tests.
WavJourney is a system that uses large language models to generate audio content with storylines encompassing speech, music, and sound effects guided from text instructions. The demo results, while not perfect, sound great.