Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories
GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.
AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.
Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.
Self-Supervised Any-Point Tracking by Contrastive Random Walks can track any point in a video using a self-supervised global matching transformer.
MOFT is a training-free video motion interpreter and controller. It can be used to extract motion information from video diffusion models and guide the motion of generated videos without the need for retraining.
TANGO can generate high-quality body-gesture videos that match speech audio from a single video. It improves realism and synchronization by fixing audio-motion misalignment and using a diffusion model for smooth transitions.
MonST3R can estimate 3D shapes from videos over time, creating a dynamic point cloud and tracking camera positions. This method improves video depth estimation and separates moving from still objects more effectively than previous techniques.
MimicTalk can generate personalized 3D talking faces in under 15 minutes. It mimics a person’s talking style using a special audio-to-motion model, resulting in high-quality videos.
Tex4D can generate 4D textures for untextured mesh sequences from a text prompt. It combines 3D geometry with video diffusion models to ensure the textures are consistent across different views and frames.
Depth Any Video can generate high-resolution depth maps for videos. It uses a large dataset of 40,000 annotated clips to improve accuracy and includes a method for better depth inference across sequences of up to 150 frames.
TweedieMix can generate images and videos that combine multiple personalized concepts.
FreeLong can generate 128 frame videos from short video diffusion models trained on 16 frame videos without requiring additional training. It’s not SOTA, but has just the right amount of cursedness 👌
VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.
Hallo2 can create long, high-resolution (4K) animations of portrait images driven by audio. It allows users to adjust facial expressions with text labels, improving control and reducing issues like appearance drift and temporal artifacts.
Pyramidal Flow Matching can generate high-quality 5 to 10-second videos at 768p resolution and 24 FPS. It uses a unified pyramidal flow matching algorithm to link flows across different stages, making video creation more efficient.
TCAN can animate characters of various styles from a pose guidance video.
GAGAvatar can create 3D head avatars from a single image and enable real-time facial expression reenactment.
Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.
MotionMaster can extract camera motions from a single source video or multiple videos and apply them to new videos. This enables the model to control camera motions in a more flexible and controllable way, resulting in videos with variable-speed zoom, pan left, pan right, dolly zoom in, dolly zoom out and more.