Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
Uni3C is a video generation method that adds support for both camera controls and human motion in video generation.
MoCha can generate talking character animations from speech and text, allowing for multi-character conversations with turn-based dialogue.
RealisDance-DiT can generate high-quality character animations from images and pose sequences. It effectively handles challenges like character-object interactions and complex gestures while using minimal changes to the Wan-2.1 video model and is part of the Uni3C method.
RealCam-I2V can generate high-quality videos from real-world images with consistent parameter camera controls.
HunyuanPortrait can animate characters from a single portrait image by using facial expressions and head poses from video clips. It achieves lifelike animations with high consistency and control, effectively separating appearance and motion.
MTVCrafter can generate high-quality human image animations from 3D motion sequences.
PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.
HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.
FlexiAct can transfer actions from a video to a target image while keeping the person’s identity while adapting to different layouts and viewpoints.
KeySync can achieve strong lip synchronization for videos. It addresses issues like timing, facial expressions, and blocked faces, using a unique masking strategy and a new metric called LipLeak to improve visual quality.
FantasyTalking can generate talking portraits from a single image, making them look realistic with accurate lip movements and facial expressions. It uses a two-step process to align audio and video, allowing users to control how expressions and body motions appear.
Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.
SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.
Ev-DeblurVSR can turn low-resolution and blurry videos into high-resolution ones.
FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.
UniAnimate-DiT can generate high-quality animations from human images. It uses the Wan2.1 model and a lightweight pose encoder to create smooth and visually appealing results, while also upscaling animations from 480p to 720p.
ReCamMaster can re-capture videos from new camera angles.
NormalCrafter can generate consistent surface normals from video sequences. It uses video diffusion models and Semantic Feature Regularization to ensure accurate normal estimation while keeping details clear across frames.
TTT-Video can create coherent one-minute videos from text storyboards. As the title of this paper says, this uses test-time training instead of self-attention layers to be able to produce consistent multi-context scenes, which is quite the achievement. The paper is worth a read.
VACE basically adds ControlNet support to video models like Wan and LTX. It handle various video tasks like generating videos from references, video inpainting, pose control, sketch to video and more.