AI Toolbox · Video

Text-to-Video

Free text-to-video AI tools for creating engaging video content from scripts, perfect for filmmakers, marketers, and content creators.

Video AI Tools

Audio-to-Video Controllable Video Generation Image-to-Video Lip Syncing Personalized Video Generation Sketch-to-Video Talking Head Generation Text-to-Video Video Analysis Video Captioning Video Colorization Video Depth Estimation Video Editing Video Generation Video Inpainting Video Interpolation Video Object Detection Video Object Tracking Video Outpainting Video Outpainting Video Editing Video Personalization Video Prediction Video Reconstruction Video Relighting Video Restoration Video Scene Detection Video Style Transfer Video Summarization Video-to-4D Video-to-Audio Video-to-Video Video-to-Video Translation Video Upscaling Virtual Video Try-On

Slicedit

Slicedit can edit videos with a simple text prompt that retains the structure and motion of the original video while adhering to the target text.

20.05.24 · Project Page · Code · Text-to-Video · Video Editing

FIFO-Diffusion

FIFO-Diffusion can generate infinitely long videos from text without extra training. It uses a unique method that keeps memory use constant, no matter the video length, and works well on multiple GPUs.

19.05.24 · Project Page · Code · Text-to-Video

SignLLM

SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.

17.05.24 · Project Page · Code · Text-to-Video

StoryDiffusion

StoryDiffusion can generate long-range images and videos that are able to maintain consistent content across a series of generated frames. The method is able to convert a text-based story into a video with smooth transitions and consistent subjects.

02.05.24 · Project Page · Code · Text-to-Video · Video Editing

AniClipart

AniClipart can turn static clipart images into high-quality animations. It uses Bézier curves for smooth motion and aligns movements with text prompts, improving how well the animation matches the text and maintains visual style.

18.04.24 · Project Page · Code · Text-to-Video

CameraCtrl

CameraCtrl can control camera angles and movements in text-to-video generation. It improves video storytelling by adding a camera module to existing video diffusion models, making it easier to create dynamic scenes from text and camera inputs.

02.04.24 · Project Page · Code · Text-to-Video

StreamingT2V

StreamingT2V enables long text-to-video generations featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Videos can be up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations.

21.03.24 · Project Page · Code · Text-to-Video

VideoElevator

VideoElevator is a training-free and plug-and-play method that can be used to enhance temporal consistency and add more photo-realistic details of text-to-video models by using text-to-image models.

08.03.24 · Project Page · Code · Text-to-Video

UniCtrl

UniCtrl can improve the quality and consistency of videos made by text-to-video models. It enhances how frames connect and move together without needing extra training, making videos look better and more diverse in motion.

04.03.24 · Project Page · Code · Text-to-Video

Video-LaVIT

Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.

05.02.24 · Project Page · Code · Text-to-Video · Video Captioning

VideoCrafter2

VideoCrafter2 can generate high-quality videos from text prompts. It uses low-quality video data and high-quality images to improve visual quality and motion, overcoming data limitations of earlier models.

17.01.24 · Project Page · Code · Demo · Text-to-Video

FreeInit

FreeInit can improve the quality of videos made by diffusion models without extra training. It fixes issues between training and use, making videos look better and more consistent.

12.12.23 · Project Page · Code · Text-to-Video · Video Editing

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation can generate realistic and stable videos by separating spatial and temporal factors. It improves video quality by extracting motion and appearance cues, allowing for flexible content variations and better understanding of scenes.

07.12.23 · Project Page · Code · Text-to-Video

StyleCrafter

Given one or more style references, StyleCrafter can generate images and videos based on these referenced styles.

01.12.23 · Project Page · Code · Text-to-Video

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Diffusion Motion Transfer is able to translate videos with a text prompt while maintaining the input video’s motion and scene layout.

28.11.23 · Project Page · Code · Text-to-Video

Breathing Life Into Sketches Using Text-to-Video Priors

LiveSketch can automatically add motion to a single-subject sketch by providing a text prompt indicating the desired motion. The output are short SVG animations which can be easily edited.

21.11.23 · Project Page · Code · Text-to-Video · Image-to-Video

VideoDreamer

VideoDreamer is a framework that is able to generate videos that contain the given subjects and simultaneously conform to text prompts.

02.11.23 · Project Page · Code · Text-to-Video

SEINE

SEINE is a short-to-long video diffusion model that focuses on generative transitions and predictions. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of clips. The model can also be used for image-to-video animation and autoregressive video prediction.

31.10.23 · Project Page · Code · Text-to-Video · Video Editing · Video Summarization

FreeNoise

FreeNoise is a method that can generate longer videos with up to 512 frames from multiple text prompts. That’s about 21 seconds for a 24fps video. The method doesn’t require any additional fine-tuning on the video diffusion model and only takes about 20% more time compared to the original diffusion process.

23.10.23 · Project Page · Code · Text-to-Video

MotionDirector

MotionDirector is a method that can train text-to-video diffusion models to generate videos with the desired motions from a reference video.

12.10.23 · Project Page · Code · Text-to-Video · Video Editing