AI Toolbox · Video

Text-to-Video

Free text-to-video AI tools for creating engaging video content from scripts, perfect for filmmakers, marketers, and content creators.

Video AI Tools

Audio-to-Video Controllable Video Generation Image-to-Video Lip Syncing Personalized Video Generation Sketch-to-Video Talking Head Generation Text-to-Video Video Analysis Video Captioning Video Colorization Video Depth Estimation Video Editing Video Generation Video Inpainting Video Interpolation Video Object Detection Video Object Tracking Video Outpainting Video Outpainting Video Editing Video Personalization Video Prediction Video Reconstruction Video Relighting Video Restoration Video Scene Detection Video Style Transfer Video Summarization Video-to-4D Video-to-Audio Video-to-Video Video-to-Video Translation Video Upscaling Virtual Video Try-On

PUSA V1.0

Pusa V1.0 can generate high-quality videos from images and text prompts. It achieves a VBench-I2V score of 87.32% with only $500 in training costs and supports features like video transitions and extensions.

23.07.25 · Project Page · Code · Model · Image-to-Video · Text-to-Video

Tora

Tora can generate high-quality videos with precise control over motion trajectories by integrating textual, visual, and trajectory conditions. It achieves high motion fidelity and allows for diverse video durations, aspect ratios, and resolutions, making it a versatile tool for video generation.

09.07.25 · Project Page · Code · Text-to-Video

LinGen

LinGen can generate high-resolution minute-length videos on a single GPU.

09.06.25 · Project Page · Code · Text-to-Video · Video Generation

ContentV

ContentV can generate high-quality videos from text prompts in various resolutions and lengths.

06.06.25 · Project Page · Code · Text-to-Video

DualParal

DualParal can generate minute-long videos.

28.05.25 · Project Page · Code · Text-to-Video

MoCha

MoCha can generate talking character animations from speech and text, allowing for multi-character conversations with turn-based dialogue.

19.05.25 · Project Page · Code · Text-to-Video

RealisDance-DiT

RealisDance-DiT can generate high-quality character animations from images and pose sequences. It effectively handles challenges like character-object interactions and complex gestures while using minimal changes to the Wan-2.1 video model and is part of the Uni3C method.

19.05.25 · Project Page · Code · Text-to-Video

Progressive Autoregressive Video Diffusion Models

PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.

12.05.25 · Project Page · Code · Text-to-Video

HunyuanCustom

HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.

07.05.25 · Project Page · Code · Text-to-Video · Audio-to-Video · Video-to-Video · Personalized Video Generation

Phantom

Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.

21.04.25 · Project Page · Code · Text-to-Video · Image-to-Video · Personalized Video Generation

SkyReels-V2

SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.

20.04.25 · Code · Text-to-Video · Image-to-Video

FramePack

FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.

18.04.25 · Project Page · Code · Text-to-Video · Image-to-Video

TTT-Video

TTT-Video can create coherent one-minute videos from text storyboards. As the title of this paper says, this uses test-time training instead of self-attention layers to be able to produce consistent multi-context scenes, which is quite the achievement. The paper is worth a read.

08.04.25 · Project Page · Code · Text-to-Video · Video Generation

AccVideo

AccVideo can speed up video diffusion models by reducing the number of steps needed for video creation. It achieves an 8.5x faster generation speed compared to HunyuanVideo, producing high-quality videos at 720x1280 resolution and 24fps, which makes text-to-video generation way more efficient.

26.03.25 · Project Page · Code · Text-to-Video

MotionMatcher

MotionMatcher can customize text-to-video diffusion models using a reference video to transfer motion and camera framing to different scenes.

24.03.25 · Project Page · Code · Text-to-Video

PP-VCtrl

PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.

21.03.25 · Project Page · Code · Text-to-Video · Image-to-Video · Controllable Video Generation

Mobius

Mobius can generate seamlessly looping videos from text descriptions.

16.03.25 · Project Page · Code · Text-to-Video

MovieAgent

MovieAgent can generate long-form videos with multiple scenes and shots from a script and character bank. It ensures character consistency and synchronized subtitles while reducing the need for human input in movie production.

13.03.25 · Project Page · Code · Text-to-Video · Video Editing

VideoMaker

VideoMaker can generate personalized videos from a single subject reference image.

04.03.25 · Project Page · Code · Personalized Video Generation · Controllable Video Generation · Text-to-Video

Step-Video-T2V

Step-Video-T2V can generate high-quality videos up to 204 frames long using a 30B parameter text-to-video model.

18.02.25 · Code · Model · Text-to-Video