Text-to-Video
Free text-to-video AI tools for creating engaging video content from scripts, perfect for filmmakers, marketers, and content creators.
RepVideo can improve video generation by making visuals look better and ensuring smooth transitions.
Kinetic Typography Diffusion Model can generate kinetic typography videos with legible and artistic letter motions based on text prompts.
TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.
DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.
CustomCrafter can generate high-quality videos from text prompts and reference images. It improves motion generation with a Dynamic Weighted Video Sampling Strategy and allows for better concept combinations without needing extra video or fine-tuning.
SynCamMaster can generate videos from different viewpoints while keeping the look and shape consistent. It improves text-to-video models for multi-camera use and allows re-rendering from new angles.
On the other hand, Customizing Motion can learn and generalize input motion patterns from input videos and apply them to new and unseen contexts.
VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.
Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.
VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.
Pyramidal Flow Matching can generate high-quality 5 to 10-second videos at 768p resolution and 24 FPS. It uses a unified pyramidal flow matching algorithm to link flows across different stages, making video creation more efficient.
ViewCrafter can generate high-quality 3D views from single or few images using a video diffusion model. It allows for precise camera control and is useful for real-time rendering and turning text into 3D scenes.
[Matryoshka Diffusion Models] can generate high-quality images and videos using a NestedUNet architecture that denoises inputs at different resolutions. This method allows for strong performance at resolutions up to 1024x1024 pixels and supports effective training without needing specific examples.
SparseCtrl is a image-to-video method with some cool new capabilities. With its RGB, depth and sketch encoder and one or few input images, it can animate images, interpolate between keyframes, extend videos as well as guide video generation with only depth maps or a few sketches. Especially in love with how scene transitions look like.
Text-Animator can depict the structures of visual text in generated videos. It supports camera control and text refinement to improve the stability of the generated visual text.
MotionBooth can generate videos of customized subjects from a few images and a text prompt with precise control over both object and camera movements.
Mora can enable generalist video generation through a multi-agent framework. It supports text-to-video generation, video editing, and digital world simulation, achieving performance similar to the Sora model.
Slicedit can edit videos with a simple text prompt that retains the structure and motion of the original video while adhering to the target text.
FIFO-Diffusion can generate infinitely long videos from text without extra training. It uses a unique method that keeps memory use constant, no matter the video length, and works well on multiple GPUs.
SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.