Text-to-Speech
Free text-to-speech AI tools for converting text into lifelike audio, perfect for podcasts, presentations, and enhancing accessibility.
PeriodWave can generate high-quality speech waveforms by capturing repeating sound patterns. It uses a period-aware flow matching estimator to outperform other models in text-to-speech tasks and Mel-spectrogram reconstruction.
F5-TTS can generate natural-sounding speech using a fast text-to-speech system. It supports multiple languages, can switch between languages smoothly, and is trained on a large dataset of 100,000 hours.
Vevo can imitate voices without needing specific training data. It can change accents and emotions while keeping output high quality, using a self-supervised method that separates different speech features.
AudioLDM 2 can generate high-quality audio in different forms, like text-to-audio and image-to-audio. It uses a smart training method to achieve top performance on important tests.
CoMoSpeech can synthesize speech and singing voices in one step with high audio quality. It runs over 150 times faster than real-time on a single NVIDIA A100 GPU, making it practical for text-to-speech and singing applications.