AI Toolbox

Video AI Tools

Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.

Video AI Tools

Audio-to-Video Controllable Video Generation Image-to-Video Lip Syncing Personalized Video Generation Sketch-to-Video Talking Head Generation Text-to-Video Video Analysis Video Captioning Video Colorization Video Depth Estimation Video Editing Video Generation Video Inpainting Video Interpolation Video Object Detection Video Object Tracking Video Outpainting Video Outpainting Video Editing Video Personalization Video Prediction Video Reconstruction Video Relighting Video Restoration Video Scene Detection Video Style Transfer Video Summarization Video-to-4D Video-to-Audio Video-to-Video Video-to-Video Translation Video Upscaling Virtual Video Try-On

Generalizable Implicit Motion Modeling for Video Frame Interpolation

GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.

06.11.24 · Project Page · Code · Video Analysis

AutoVFX

AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.

05.11.24 · Project Page · Code · Video Editing

Adaptive Caching

Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.

05.11.24 · Project Page · Code · Text-to-Video

GMRW

Self-Supervised Any-Point Tracking by Contrastive Random Walks can track any point in a video using a self-supervised global matching transformer.

03.11.24 · Project Page · Code · Video Object Tracking

Video Diffusion Models are Training-free Motion Interpreter and Controller

MOFT is a training-free video motion interpreter and controller. It can be used to extract motion information from video diffusion models and guide the motion of generated videos without the need for retraining.

02.11.24 · Project Page · Code · Video Object Tracking

TANGO

TANGO can generate high-quality body-gesture videos that match speech audio from a single video. It improves realism and synchronization by fixing audio-motion misalignment and using a diffusion model for smooth transitions.

28.10.24 · Project Page · Code · Audio-to-Video · Talking Head Generation

MonST3R

MonST3R can estimate 3D shapes from videos over time, creating a dynamic point cloud and tracking camera positions. This method improves video depth estimation and separates moving from still objects more effectively than previous techniques.

18.10.24 · Project Page · Code · Video Depth Estimation · Video-to-4D

MimicTalk

MimicTalk can generate personalized 3D talking faces in under 15 minutes. It mimics a person’s talking style using a special audio-to-motion model, resulting in high-quality videos.

16.10.24 · Project Page · Code · Talking Head Generation

Tex4D

Tex4D can generate 4D textures for untextured mesh sequences from a text prompt. It combines 3D geometry with video diffusion models to ensure the textures are consistent across different views and frames.

15.10.24 · Project Page · Code · Video-to-4D

Depth Any Video

Depth Any Video can generate high-resolution depth maps for videos. It uses a large dataset of 40,000 annotated clips to improve accuracy and includes a method for better depth inference across sequences of up to 150 frames.

15.10.24 · Project Page · Code · Video Depth Estimation

TweedieMix

TweedieMix can generate images and videos that combine multiple personalized concepts.

12.10.24 · Code · Personalized Video Generation · Personalized Image Generation

FreeLong

FreeLong can generate 128 frame videos from short video diffusion models trained on 16 frame videos without requiring additional training. It’s not SOTA, but has just the right amount of cursedness 👌

11.10.24 · Project Page · Code · Video Inpainting

VSTAR

VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.

10.10.24 · Project Page · Code · Text-to-Video

Hallo2

Hallo2 can create long, high-resolution (4K) animations of portrait images driven by audio. It allows users to adjust facial expressions with text labels, improving control and reducing issues like appearance drift and temporal artifacts.

10.10.24 · Project Page · Code · Talking Head Generation

Pyramid Flow

Pyramidal Flow Matching can generate high-quality 5 to 10-second videos at 768p resolution and 24 FPS. It uses a unified pyramidal flow matching algorithm to link flows across different stages, making video creation more efficient.

10.10.24 · Project Page · Code · Demo · Model · Text-to-Video · Image-to-Video

TCAN

TCAN can animate characters of various styles from a pose guidance video.

04.10.24 · Project Page · Code · Image-to-Video

GAGAvatar

GAGAvatar can create 3D head avatars from a single image and enable real-time facial expression reenactment.

04.10.24 · Project Page · Code · Talking Head Generation

Explorative Inbetweening of Time and Space

Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.

04.10.24 · Project Page · Code · Image-to-Video

MotionMaster

MotionMaster can extract camera motions from a single source video or multiple videos and apply them to new videos. This enables the model to control camera motions in a more flexible and controllable way, resulting in videos with variable-speed zoom, pan left, pan right, dolly zoom in, dolly zoom out and more.

01.10.24 · Project Page · Code · Video Analysis

LVCD

LVCD can colorize lineart videos using a pretrained video diffusion model. It ensures smooth motion and high video quality by effectively transferring colors from reference frames.

29.09.24 · Project Page · Code · Video Colorization