Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
ViViD can transfer a clothing item onto the video of a target person. The method is able to capture garment details and human posture, resulting in more coherent and lifelike videos.
FIFO-Diffusion can generate infinitely long videos from text without extra training. It uses a unique method that keeps memory use constant, no matter the video length, and works well on multiple GPUs.
SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.
SwapTalk can transfer a user’s avatar’s facial features onto a video while lip-syncing to chosen audio. It improves video quality and lip-sync accuracy, making the results more consistent than other methods.
StoryDiffusion can generate long-range images and videos that are able to maintain consistent content across a series of generated frames. The method is able to convert a text-based story into a video with smooth transitions and consistent subjects.
VimTS can extract text from images and videos, improving how well it works across different types of media.
FlowSAM can discover and segment moving objects in videos by combining the Segment Anything Model (SAM) with optical flow. It outperforms previous methods, achieving better object identity and sequence-level segmentation for both single and multi-object scenarios.
AniClipart can turn static clipart images into high-quality animations. It uses Bézier curves for smooth motion and aligns movements with text prompts, improving how well the animation matches the text and maintains visual style.
Speaking about video, more research is being conducted on motion control. Peekaboo allows to control the position, size and trajectory of an object very precisely through bounding boxes.
Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.
Sparse Global Matching for Video Frame Interpolation with Large Motion can handle large motion in video frame interpolation by using a sparse global matching approach.
CameraCtrl can control camera angles and movements in text-to-video generation. It improves video storytelling by adding a camera module to existing video diffusion models, making it easier to create dynamic scenes from text and camera inputs.
EDTalk can create talking face videos with control over mouth shapes, head poses, and emotions. It uses an Efficient Disentanglement framework to enhance realism by manipulating facial movements through three separate areas.
Motion Inversion can be used to customize the motion of videos by matching the motion of a different video.
DSTA is a method for video-based human pose estimation which is able to directly map input to output joint coordinates.
TRAM can reconstruct human motion and camera movement from videos in dynamic settings. It reduces global motion errors by 60% and uses a video transformer model to accurately track body motion.
TRIP is a new approach to image-to-video generation with better temporal coherence.
Spectral Motion Alignment is a framework that can capture complex and long-range motion patterns within videos and transfer them to video-to-video frameworks like MotionDirector, VMC, Tune-A-Video, and ControlVideo.
StreamingT2V enables long text-to-video generations featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Videos can be up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations.
AnyV2V can edit videos using prompt-based editing and style transfer without fine-tuning. It modifies the first frame of a video and generates the edited video while keeping high visual quality.