Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
Make-Your-Video can generate customized videos from text and depth information for better control over content. It uses a Latent Diffusion Model to improve video quality and reduce the need for computing power.
Control-A-Video can generate controllable text-to-video content using diffusion models. It allows for fine-tuned customization with edge and depth maps, ensuring high quality and consistency in the videos.
Make-A-Protagonist can edit videos by changing the protagonist, background, and style using text and images. It allows for detailed control over video content, helping users create unique and personalized videos.
HumanRF can capture high-quality full-body human motion from multiple video angles. It allows playback from new viewpoints at 12 megapixels and uses a 4D dynamic neural scene representation for smooth and realistic motion, making it great for film and gaming.
[Sketching the Future] can generate high-quality videos from sketched frames using zero-shot text-to-video generation and ControlNet. It smoothly fills in frames between sketches to create consistent video content that matches the user’s intended motion.
Total-Recon can render scenes from monocular RGBD videos from different camera angles, like first-person and third-person views. It creates realistic 3D videos of moving objects and allows for 3D filters that add virtual items to people in the scene.
DreamPose can generate animated fashion videos from a single image and a sequence of human body poses. The method is able to capture both human and fabric motion and supports a variety of clothing styles and poses.
Follow Your Pose can generate character videos that match specific poses from text descriptions. It uses a two-stage training process with pre-trained text-to-image models, allowing for continuous pose control and editing.
vid2vid-zero can edit videos without needing extra training on video data. It uses image diffusion models for text-to-video alignment and keeps the original video’s look and feel, allowing for effective changes to scenes and subjects.
Text2Video-Zero can generate high-quality videos from text prompts using existing text-to-image diffusion models. It adds motion dynamics and cross-frame attention, making it useful for conditional video generation and instruction-guided video editing.
Blind Video Deflickering by Neural Filtering with a Flawed Atlas can remove flicker from videos without needing extra guidance. It works well on different types of videos and uses a neural atlas for better consistency, outperforming other methods.
3D Cinemagraphy can turn a single still image into a video by adding motion and depth. It uses 3D space to create realistic animations and fix common issues like artifacts and inconsistent movements.
Video-P2P can edit videos using advanced techniques like word swap and prompt refinement. It adapts image generation models for video, allowing for the creation of new characters while keeping original poses and scenes.
[Projected Latent Video Diffusion Models (PVDM)] can generate high-resolution and smooth videos in a low-dimensional space. It achieves a top score of 639.7 on the UCF-101 benchmark, greatly surpassing previous methods.
Dreamix can edit videos based on a text prompt while keeping colors, sizes, and camera angles consistent. It combines low-resolution video data with high-quality content, allowing for advanced editing of motion and appearance.
SceneScape can generate long videos of different scenes from text prompts and camera angles. It ensures 3D consistency by building a unified mesh of the scene, allowing for realistic walkthroughs in places like spaceships and caves.
Shape-aware Text-driven Layered Video Editing can edit the shape of objects in videos while keeping them consistent across frames. It uses a text-conditioned diffusion model to achieve this, making video editing more effective than other methods.
Tune-A-Video can generate videos from a single text-video pair by fine-tuning text-to-image diffusion models. It lets users change subjects, backgrounds, and styles while keeping the video content consistent.
MAGVIT can perform video synthesis tasks like inpainting, outpainting, and generating animations from single images. It is much faster than other models, working 100 times quicker than diffusion models and 60 times faster than autoregressive models, while also achieving the best results on multiple benchmarks.
MotionBERT can recover 3D human motion from noisy 2D observations. It excels in 3D pose estimation, action recognition, and motion prediction, achieving the lowest pose estimation error when trained from scratch.