Video AI Tools
Free video AI tools for editing, generating animations, and analyzing footage, perfect for filmmakers and content creators seeking efficiency.
While ZeroScope, Gen-2, PikaLabs and others have brought us high resolution text- and image-to-video, they all suffer from unsmooth video transition, crude video motion and action occurrence disorder. The new Dysen-VDM tries to tackle those issues, and while nowhere near perfect, delivers some promising results.
StableVideo is yet another vid2vid method. This one is not just a style transfer though, the method is able to differentiate between fore- and background when editing a video, making it possible to reimagine the subject within an entirely different landscape.
CoDeF can process videos consistently by using a canonical content field to gather static content and a temporal deformation field to track changes over time. This allows it to perform tasks like video-to-video translation and track moving objects, such as water and smog, without needing extra training.
TokenFlow is a new video-to-video method for temporal coherent video editing with text. We’ve seen a lot of them, but this one looks extremely good with almost no flickering and requires no fine-tuning whatsoever.
VideoComposer can generate videos with control over how they look and move using text, sketches, and motion vectors. It improves video quality by ensuring frames match well, allowing for flexible video creation and editing.
Make-Your-Video can generate customized videos from text and depth information for better control over content. It uses a Latent Diffusion Model to improve video quality and reduce the need for computing power.
Control-A-Video can generate controllable text-to-video content using diffusion models. It allows for fine-tuned customization with edge and depth maps, ensuring high quality and consistency in the videos.
Make-A-Protagonist can edit videos by changing the protagonist, background, and style using text and images. It allows for detailed control over video content, helping users create unique and personalized videos.
HumanRF can capture high-quality full-body human motion from multiple video angles. It allows playback from new viewpoints at 12 megapixels and uses a 4D dynamic neural scene representation for smooth and realistic motion, making it great for film and gaming.
[Sketching the Future] can generate high-quality videos from sketched frames using zero-shot text-to-video generation and ControlNet. It smoothly fills in frames between sketches to create consistent video content that matches the user’s intended motion.
Total-Recon can render scenes from monocular RGBD videos from different camera angles, like first-person and third-person views. It creates realistic 3D videos of moving objects and allows for 3D filters that add virtual items to people in the scene.
DreamPose can generate animated fashion videos from a single image and a sequence of human body poses. The method is able to capture both human and fabric motion and supports a variety of clothing styles and poses.
Follow Your Pose can generate character videos that match specific poses from text descriptions. It uses a two-stage training process with pre-trained text-to-image models, allowing for continuous pose control and editing.
vid2vid-zero can edit videos without needing extra training on video data. It uses image diffusion models for text-to-video alignment and keeps the original video’s look and feel, allowing for effective changes to scenes and subjects.
Text2Video-Zero can generate high-quality videos from text prompts using existing text-to-image diffusion models. It adds motion dynamics and cross-frame attention, making it useful for conditional video generation and instruction-guided video editing.
Blind Video Deflickering by Neural Filtering with a Flawed Atlas can remove flicker from videos without needing extra guidance. It works well on different types of videos and uses a neural atlas for better consistency, outperforming other methods.
3D Cinemagraphy can turn a single still image into a video by adding motion and depth. It uses 3D space to create realistic animations and fix common issues like artifacts and inconsistent movements.
Video-P2P can edit videos using advanced techniques like word swap and prompt refinement. It adapts image generation models for video, allowing for the creation of new characters while keeping original poses and scenes.
[Projected Latent Video Diffusion Models (PVDM)] can generate high-resolution and smooth videos in a low-dimensional space. It achieves a top score of 639.7 on the UCF-101 benchmark, greatly surpassing previous methods.
Dreamix can edit videos based on a text prompt while keeping colors, sizes, and camera angles consistent. It combines low-resolution video data with high-quality content, allowing for advanced editing of motion and appearance.