Image-to-Video
Free image-to-video AI tools for quickly transforming images into dynamic videos, perfect for content creators and filmmakers.
Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.
SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.
FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.
UniAnimate-DiT can generate high-quality animations from human images. It uses the Wan2.1 model and a lightweight pose encoder to create smooth and visually appealing results, while also upscaling animations from 480p to 720p.
VACE basically adds ControlNet support to video models like Wan and LTX. It handle various video tasks like generating videos from references, video inpainting, pose control, sketch to video and more.
Perception-as-Control can achieve fine-grained motion control for image animation by creating a 3D motion representation from a reference image.
CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.
LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.
PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.
Magic 1-For-1 can generate one-minute video clips in just one minute.
Go-with-the-Flow can control motion patterns in video diffusion models using real-time warped noise from optical flow fields. It allows users to manipulate object movements and camera motions while keeping high image quality and not needing changes to existing models.
DisPose can generate high-quality human image animations from sparse skeleton pose guidance.
ObjCtrl-2.5D enables object control in image-to-video generation using 3D trajectories from 2D inputs with depth information.
Inverse Painting can generate time-lapse videos of the painting process from a target artwork. It uses a diffusion-based renderer to learn from real artists’ techniques, producing realistic results across different artistic styles.
CamI2V is a method which can generate videos from images with precise control over camera movements and text prompts.
JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.
DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.
SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories
Pyramidal Flow Matching can generate high-quality 5 to 10-second videos at 768p resolution and 24 FPS. It uses a unified pyramidal flow matching algorithm to link flows across different stages, making video creation more efficient.
TCAN can animate characters of various styles from a pose guidance video.