Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.
SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.
Ev-DeblurVSR can turn low-resolution and blurry videos into high-resolution ones.
FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.
UniAnimate-DiT can generate high-quality animations from human images. It uses the Wan2.1 model and a lightweight pose encoder to create smooth and visually appealing results, while also upscaling animations from 480p to 720p.
ReCamMaster can re-capture videos from new camera angles.
NormalCrafter can generate consistent surface normals from video sequences. It uses video diffusion models and Semantic Feature Regularization to ensure accurate normal estimation while keeping details clear across frames.
TTT-Video can create coherent one-minute videos from text storyboards. As the title of this paper says, this uses test-time training instead of self-attention layers to be able to produce consistent multi-context scenes, which is quite the achievement. The paper is worth a read.
VACE basically adds ControlNet support to video models like Wan and LTX. It handle various video tasks like generating videos from references, video inpainting, pose control, sketch to video and more.
Perception-as-Control can achieve fine-grained motion control for image animation by creating a 3D motion representation from a reference image.
SegAnyMo can segment moving objects in videos without needing human labels.
AccVideo can speed up video diffusion models by reducing the number of steps needed for video creation. It achieves an 8.5x faster generation speed compared to HunyuanVideo, producing high-quality videos at 720x1280 resolution and 24fps, which makes text-to-video generation way more efficient.
CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.
FloVD can generate camera-controllable videos using optical flow maps to show motion.
MotionMatcher can customize text-to-video diffusion models using a reference video to transfer motion and camera framing to different scenes.
LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.
StyleMaster can stylize videos by transferring artistic styles from images while keeping the original content clear.
PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.
MagicMotion can animate objects in videos by controlling their paths with masks, bounding boxes, and sparse boxes.
KDTalker can generate high-quality talking portraits from a single image and audio input. It captures fine facial details and achieves excellent lip synchronization using a 3D keypoint-based approach and a spatiotemporal diffusion model.