Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
FRESCO combines ControlNet with Ebsynth for zero-shot video translation that focuses on preserving the spatial and temporal consistency of the input frames.
AnimateDiff-Lightning can generate videos over ten times faster than AnimateDiff. It uses progressive adversarial diffusion distillation to combine multiple diffusion models into one motion module, improving style compatibility and achieving top performance in few-step video generation.
DragAnything can control the motion of any object in videos by letting users draw trajectory lines. It allows for separate motion control of multiple objects, including backgrounds.
VideoElevator is a training-free and plug-and-play method that can be used to enhance temporal consistency and add more photo-realistic details of text-to-video models by using text-to-image models.
UniCtrl can improve the quality and consistency of videos made by text-to-video models. It enhances how frames connect and move together without needing extra training, making videos look better and more diverse in motion.
Magic-Me can generate identity-specific videos from a few reference images while keeping the person’s features clear.
ConsistI2V is an image-to-video method with enhanced visual consistency. Compared to other methods, this one is able to better maintain the subject, background, and style from the first frame, as well as ensure a fluid and logical progression while supporting long video generation as well as camera motion control.
Direct-a-Video can individually or jointly control camera movement and object motion in text-to-video generations. This means you can generate a video and tell the model to move the camera from left to right, zoom in or out and move objects around in the scene.
Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.
Last year we got real-time diffusion for images, this year we’ll get it for video! AnimateLCM can generate high-fidelity videos with minimal steps. The model also supports image-to-video as well as support for adapters like ControlNet. It’s not available yet, but once it hits, expect way more AI generated video content.
Motion-I2V can generate videos from images with clear and controlled motion. It uses a two-stage process with a motion field predictor and temporal attention, allowing for precise control over how things move and enabling video-to-video translation without needing extra training.
Language-Driven Video Inpainting can guide the video inpainting process using natural language instructions, which removes the need for manual mask labeling.
VideoCrafter2 can generate high-quality videos from text prompts. It uses low-quality video data and high-quality images to improve visual quality and motion, overcoming data limitations of earlier models.
FMA-Net can turn blurry, low-quality videos into clear, high-quality ones by accurately predicting the degradation and restoration processes, considering the movement in the video through advanced learning of motion patterns.
MagicDriveDiT can generate high-resolution street scene videos for self-driving cars.
MoonShot is a video generation model that can condition on both image and text inputs. The model is also able to integrate with pre-trained image ControlNet modules for geometry visual conditions, making it possible to generate videos with specific visual appearances and structures.
VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.
FreeInit can improve the quality of videos made by diffusion models without extra training. It fixes issues between training and use, making videos look better and more consistent.
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation can generate realistic and stable videos by separating spatial and temporal factors. It improves video quality by extracting motion and appearance cues, allowing for flexible content variations and better understanding of scenes.
MotionCtrl is a flexible motion controller that is able to manage both camera and object motions in the generated videos and can be used with VideoCrafter1, AnimateDiff Stable Video Diffusion.