Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
CAVIS can do instance segmentation on videos. It’s able to better track objects and improve instance matching accuracy, resulting in more accurate and stable instance segmentation.
VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.
Inverse Painting can generate time-lapse videos of the painting process from a target artwork. It uses a diffusion-based renderer to learn from real artists’ techniques, producing realistic results across different artistic styles.
CAT4D can create dynamic 4D scenes from single videos. It uses a multi-view video diffusion model to generate videos from different angles, allowing for strong 4D reconstruction and high-quality images.
SAMURAI combines the SOTA visual video tracking of SAM 2 with motion-aware memory.
StableV2V can stabilize shape consistency in video-to-video editing by breaking down the editing process into steps that match user prompts. It handles text-based, image-based, and video inpainting.
CamI2V is a method which can generate videos from images with precise control over camera movements and text prompts.
JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.
CHANGER can integrate an actor’s head onto a target body in digital content. It uses chroma keying for clear backgrounds and enhances blending quality with Head shape and long Hair augmentation (H2 augmentation) and a Foreground Predictive Attention Transformer (FPAT).
DAWN can generate talking head videos from a single portrait and audio clip. It produces lip movements and head poses quickly, making it effective for creating long video sequences.
DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.
SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories
GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.
AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.
Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.
Self-Supervised Any-Point Tracking by Contrastive Random Walks can track any point in a video using a self-supervised global matching transformer.
MOFT is a training-free video motion interpreter and controller. It can be used to extract motion information from video diffusion models and guide the motion of generated videos without the need for retraining.
TANGO can generate high-quality body-gesture videos that match speech audio from a single video. It improves realism and synchronization by fixing audio-motion misalignment and using a diffusion model for smooth transitions.
MonST3R can estimate 3D shapes from videos over time, creating a dynamic point cloud and tracking camera positions. This method improves video depth estimation and separates moving from still objects more effectively than previous techniques.
MimicTalk can generate personalized 3D talking faces in under 15 minutes. It mimics a person’s talking style using a special audio-to-motion model, resulting in high-quality videos.