Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
MVOC is a training-free multiple video object composition method with diffusion models. The method can be used to composite multiple video objects into a single video while maintaining motion and identity consistency.
Conditional Image Leakage can be used to generate videos with more dynamic and natural motion from image prompts.
Image Conductor can generate video assets from a single image with precise control over camera transitions and object movements.
Mora can enable generalist video generation through a multi-agent framework. It supports text-to-video generation, video editing, and digital world simulation, achieving performance similar to the Sora model.
EvTexture is a video super-resolution upscaling method that utilizes event signals for texture enhancement for more accurate texture and high-resolution detail recovery.
MM-Diffusion can generate high-quality audio-video pairs using a multi-modal diffusion model with two coupled denoising autoencoders.
ReVideo can change video content in specific areas while keeping the motion intact. It allows users to customize motion paths and uses a three-stage training method for precise video editing.
Slicedit can edit videos with a simple text prompt that retains the structure and motion of the original video while adhering to the target text.
ViViD can transfer a clothing item onto the video of a target person. The method is able to capture garment details and human posture, resulting in more coherent and lifelike videos.
FIFO-Diffusion can generate infinitely long videos from text without extra training. It uses a unique method that keeps memory use constant, no matter the video length, and works well on multiple GPUs.
SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.
SwapTalk can transfer a user’s avatar’s facial features onto a video while lip-syncing to chosen audio. It improves video quality and lip-sync accuracy, making the results more consistent than other methods.
StoryDiffusion can generate long-range images and videos that are able to maintain consistent content across a series of generated frames. The method is able to convert a text-based story into a video with smooth transitions and consistent subjects.
VimTS can extract text from images and videos, improving how well it works across different types of media.
FlowSAM can discover and segment moving objects in videos by combining the Segment Anything Model (SAM) with optical flow. It outperforms previous methods, achieving better object identity and sequence-level segmentation for both single and multi-object scenarios.
AniClipart can turn static clipart images into high-quality animations. It uses Bézier curves for smooth motion and aligns movements with text prompts, improving how well the animation matches the text and maintains visual style.
Speaking about video, more research is being conducted on motion control. Peekaboo allows to control the position, size and trajectory of an object very precisely through bounding boxes.
Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.
Sparse Global Matching for Video Frame Interpolation with Large Motion can handle large motion in video frame interpolation by using a sparse global matching approach.
CameraCtrl can control camera angles and movements in text-to-video generation. It improves video storytelling by adding a camera module to existing video diffusion models, making it easier to create dynamic scenes from text and camera inputs.