Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.
Last year we got real-time diffusion for images, this year we’ll get it for video! AnimateLCM can generate high-fidelity videos with minimal steps. The model also supports image-to-video as well as support for adapters like ControlNet. It’s not available yet, but once it hits, expect way more AI generated video content.
Motion-I2V can generate videos from images with clear and controlled motion. It uses a two-stage process with a motion field predictor and temporal attention, allowing for precise control over how things move and enabling video-to-video translation without needing extra training.
Language-Driven Video Inpainting can guide the video inpainting process using natural language instructions, which removes the need for manual mask labeling.
VideoCrafter2 can generate high-quality videos from text prompts. It uses low-quality video data and high-quality images to improve visual quality and motion, overcoming data limitations of earlier models.
FMA-Net can turn blurry, low-quality videos into clear, high-quality ones by accurately predicting the degradation and restoration processes, considering the movement in the video through advanced learning of motion patterns.
MagicDriveDiT can generate high-resolution street scene videos for self-driving cars.
MoonShot is a video generation model that can condition on both image and text inputs. The model is also able to integrate with pre-trained image ControlNet modules for geometry visual conditions, making it possible to generate videos with specific visual appearances and structures.
VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.
FreeInit can improve the quality of videos made by diffusion models without extra training. It fixes issues between training and use, making videos look better and more consistent.
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation can generate realistic and stable videos by separating spatial and temporal factors. It improves video quality by extracting motion and appearance cues, allowing for flexible content variations and better understanding of scenes.
MotionCtrl is a flexible motion controller that is able to manage both camera and object motions in the generated videos and can be used with VideoCrafter1, AnimateDiff Stable Video Diffusion.
Given one or more style references, StyleCrafter can generate images and videos based on these referenced styles.
Diffusion Motion Transfer is able to translate videos with a text prompt while maintaining the input video’s motion and scene layout.
Sketch Video Synthesis can turn videos into SVG sketches using frame-wise Bézier curves. It allows for impressive visual effects like resizing, color filling, and adding doodles to original images while maintaining a smooth flow between frames.
LiveSketch can automatically add motion to a single-subject sketch by providing a text prompt indicating the desired motion. The output are short SVG animations which can be easily edited.
InterpAny-Clearer is a video frame interpolation method that is able to generate clearer and sharper frames compared to existing methods. Additionally, it introduces the ability to manipulate the interpolation of objects in a video independently, which could be useful for video editing tasks.
I2VGen-XL can generate high-quality videos from static images using a cascaded diffusion model. It achieves a resolution of 1280x720 and improves the flow of movement in videos through a two-stage process that separates detail enhancement from overall coherence.
VideoDreamer is a framework that is able to generate videos that contain the given subjects and simultaneously conform to text prompts.
SEINE is a short-to-long video diffusion model that focuses on generative transitions and predictions. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of clips. The model can also be used for image-to-video animation and autoregressive video prediction.