Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
HumanVid can generate videos from a character photo while allowing users to control both human and camera motions. It introduces a large-scale dataset that combines high-quality real-world and synthetic data, achieving state-of-the-art performance in camera-controllable human image animation.
Follow-Your-Canvas can outpaint videos at higher resolutions, from 512x512 to 1152x2048.
KEEP can enhance video face super-resolution by maintaining consistency across frames. It uses Kalman filtering to improve facial details, working well on both synthetic and real-world videos.
TVG can create smooth transition videos between two images without needing training. It uses diffusion models and Gaussian Process Regression for high-quality results and adds controls for better timing.
[Matryoshka Diffusion Models] can generate high-quality images and videos using a NestedUNet architecture that denoises inputs at different resolutions. This method allows for strong performance at resolutions up to 1024x1024 pixels and supports effective training without needing specific examples.
Puppet-Master can create realistic motion in videos from a single image using simple drag controls. It uses a fine-tuned video diffusion model and all-to-first attention method to make high-quality videos.
Generative Camera Dolly can regenerate a video from any chosen perspective. Still very early, but imagine being able to change any shot or angle in a video after it’s been recorded!
AniTalker is another talking head generator that can animate talking faces from a single portrait and input audio with naturally flowing movements and diverse outcomes.
Audio-Synchronized Visual Animation can animate static images using audio clips to create synchronized visual animations. It uses the AVSync15 dataset and the AVSyncD diffusion model to produce high-quality animations across different audio types.
Shape of Motion can reconstruct 3D scenes from a single video. The method is able to capture the full 3D motion of a scene and can handle occlusions and disocclusions.
SparseCtrl is a image-to-video method with some cool new capabilities. With its RGB, depth and sketch encoder and one or few input images, it can animate images, interpolate between keyframes, extend videos as well as guide video generation with only depth maps or a few sketches. Especially in love with how scene transitions look like.
Noise Calibration can improve video quality while keeping the original content structure. It uses a noise optimization strategy with pre-trained diffusion models to enhance visuals and ensure consistency between original and enhanced videos.
ST-AVSR can enhance video resolution at any size while keeping details clear and smooth. It uses a pre-trained VGG network to improve quality and speed, making it better than other methods.
Live2Diff can translate live video streams using a special attention method in video diffusion models. It maintains smooth motion by linking each frame to previous ones and can achieve 16 frames per second on an RTX 4090 GPU, making it great for real-time use.
LivePortrait can animate a single source image with motion from a driving video. The method is able to generate high-quality videos at 60fps and is able to retarget the motion to other characters.
AniPortrait can generate high-quality portrait animations driven by audio and a reference portrait image. It also supports face reenactment from a reference video.
DiffIR2VR-Zero is a zero-shot video restoration method that can be used with any 2D image restoration diffusion model. The method is able to do 8x super-resolution and high-standard deviation video denoising.
MimicMotion can generate high-quality videos of arbitrary length mimicking specific motion guidance. The method is able to produce videos of up to 10,000 frames with acceptable resource consumption.
Text-Animator can depict the structures of visual text in generated videos. It supports camera control and text refinement to improve the stability of the generated visual text.
MotionBooth can generate videos of customized subjects from a few images and a text prompt with precise control over both object and camera movements.