Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
MotionCtrl is a flexible motion controller that is able to manage both camera and object motions in the generated videos and can be used with VideoCrafter1, AnimateDiff Stable Video Diffusion.
Given one or more style references, StyleCrafter can generate images and videos based on these referenced styles.
Diffusion Motion Transfer is able to translate videos with a text prompt while maintaining the input video’s motion and scene layout.
Sketch Video Synthesis can turn videos into SVG sketches using frame-wise Bézier curves. It allows for impressive visual effects like resizing, color filling, and adding doodles to original images while maintaining a smooth flow between frames.
LiveSketch can automatically add motion to a single-subject sketch by providing a text prompt indicating the desired motion. The output are short SVG animations which can be easily edited.
InterpAny-Clearer is a video frame interpolation method that is able to generate clearer and sharper frames compared to existing methods. Additionally, it introduces the ability to manipulate the interpolation of objects in a video independently, which could be useful for video editing tasks.
I2VGen-XL can generate high-quality videos from static images using a cascaded diffusion model. It achieves a resolution of 1280x720 and improves the flow of movement in videos through a two-stage process that separates detail enhancement from overall coherence.
VideoDreamer is a framework that is able to generate videos that contain the given subjects and simultaneously conform to text prompts.
SEINE is a short-to-long video diffusion model that focuses on generative transitions and predictions. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of clips. The model can also be used for image-to-video animation and autoregressive video prediction.
FreeNoise is a method that can generate longer videos with up to 512 frames from multiple text prompts. That’s about 21 seconds for a 24fps video. The method doesn’t require any additional fine-tuning on the video diffusion model and only takes about 20% more time compared to the original diffusion process.
MotionDirector is a method that can train text-to-video diffusion models to generate videos with the desired motions from a reference video.
SadTalker can generate talking head videos from a single image and audio. It creates realistic head movements and expressions by linking audio to 3D motion, improving video quality and coherence.
FLATTEN can improve the visual flow of edited videos by using optical flow in diffusion models. This method enhances the consistency of video frames without needing extra training.
Ground-A-Video can edit multiple attributes of a video using pre-trained text-to-image models without any training. It maintains consistency across frames and accurately preserves non-target areas, making it more effective than other editing methods.
LLM-grounded Video Diffusion Models can generate realistic videos from complex text prompts. They first create dynamic scene layouts with a large language model, which helps guide the video creation process, resulting in better accuracy for object movements and actions.
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation can generate diverse and realistic videos that match natural audio samples. It uses a lightweight adaptor network to improve alignment and visual quality compared to other methods.
Show-1 can generate high-quality videos with accurate text-video alignment. It uses only 15G of GPU memory during inference, which is much less than the 72G needed by traditional models.
ProPainter is a new video inpainting method that is able to remove objects, complete masked videos, remove watermarks and even expand the view of a video.
Another video synthesis model that caught my eye this week is Reuse and Diffuse. The novel framework for text-to-video generation adds the ability to generate more frames from an initial video clip by reusing and iterating over the original latent features. Can’t wait to give this one a try.
Hierarchical Masked 3D Diffusion Model for Video Outpainting can fill in missing parts at the edges of video frames while keeping the motion smooth. It uses a smart method that reduces errors and improves results by looking at multiple frames.