AI Toolbox · Audio

Text-to-Speech

Free text-to-speech AI tools for converting text into lifelike audio, perfect for podcasts, presentations, and enhancing accessibility.

AnCoGen

AnCoGen can analyze and generate speech by estimating key attributes like speaker identity, pitch, and loudness. It can also perform tasks such as speech denoising, pitch shifting, and voice conversion using a unified masked autoencoder model.

11.03.25 · Project Page · Code · Speech Recognition · Text-to-Speech

Spark-TTS

Spark-TTS can generate customizable voices with control over gender, speaking style, pitch, and rate. It also supports zero-shot voice cloning, allowing smooth language transitions without extra training for each voice.

05.03.25 · Code · Text-to-Speech · Personalized Audio Generation

PeriodWave

PeriodWave can generate high-quality speech waveforms by capturing repeating sound patterns. It uses a period-aware flow matching estimator to outperform other models in text-to-speech tasks and Mel-spectrogram reconstruction.

10.02.25 · Project Page · Code · Text-to-Speech

F5-TTS

F5-TTS can generate natural-sounding speech using a fast text-to-speech system. It supports multiple languages, can switch between languages smoothly, and is trained on a large dataset of 100,000 hours.

18.10.24 · Project Page · Code · Text-to-Speech

Vevo

Vevo can imitate voices without needing specific training data. It can change accents and emotions while keeping output high quality, using a self-supervised method that separates different speech features.

20.02.24 · Project Page · Code · Text-to-Speech · Controllable Audio Generation

AudioLDM 2

AudioLDM 2 can generate high-quality audio in different forms, like text-to-audio and image-to-audio. It uses a smart training method to achieve top performance on important tests.

10.08.23 · Code · Demo · Text-to-Audio · Text-to-Music · Text-to-Speech

CoMoSpeech

CoMoSpeech can synthesize speech and singing voices in one step with high audio quality. It runs over 150 times faster than real-time on a single NVIDIA A100 GPU, making it practical for text-to-speech and singing applications.

11.05.23 · Project Page · Code · Text-to-Speech