Image-to-Video
Free image-to-video AI tools for quickly transforming images into dynamic videos, perfect for content creators and filmmakers.
Pusa V1.0 can generate high-quality videos from images and text prompts. It achieves a VBench-I2V score of 87.32% with only $500 in training costs and supports features like video transitions and extensions.
Matrix-Game can generate high-quality interactive game worlds in Minecraft.
AnchorCrafter can generate high-quality 2D videos of people interacting with a reference product.
Synergizing Motion and Appearance can generate high-quality talking head videos by combining facial identity from a source image with motion from a driving video.
RealCam-I2V can generate high-quality videos from real-world images with consistent parameter camera controls.
HunyuanPortrait can animate characters from a single portrait image by using facial expressions and head poses from video clips. It achieves lifelike animations with high consistency and control, effectively separating appearance and motion.
MTVCrafter can generate high-quality human image animations from 3D motion sequences.
Skyeyes can generate photorealistic sequences of ground view images from aerial view inputs. It ensures that the images are consistent and realistic, even when there are large gaps in views.
Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.
SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.
FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.
UniAnimate-DiT can generate high-quality animations from human images. It uses the Wan2.1 model and a lightweight pose encoder to create smooth and visually appealing results, while also upscaling animations from 480p to 720p.
VACE basically adds ControlNet support to video models like Wan and LTX. It handle various video tasks like generating videos from references, video inpainting, pose control, sketch to video and more.
Perception-as-Control can achieve fine-grained motion control for image animation by creating a 3D motion representation from a reference image.
CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.
LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.
PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.
Magic 1-For-1 can generate one-minute video clips in just one minute.
Go-with-the-Flow can control motion patterns in video diffusion models using real-time warped noise from optical flow fields. It allows users to manipulate object movements and camera motions while keeping high image quality and not needing changes to existing models.
UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.