AI Toolbox
A curated collection of 610 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
SV4D can generate dynamic 3D content from a single video. It ensures that the new views are consistent across multiple frames and achieves high-quality results in video synthesis.
Artist stylizes images based on text prompts, preserving the original content while producing high aesthetic quality results. No finetuning, no ControlNets, it just works with your pretrained StableDiffusion model.
DreamCar can reconstruct 3D car models from just a few images or single-image inputs. It uses Score Distillation Sampling and pose optimization to enhance texture alignment and overall model quality, significantly outperforming existing methods.
Cinemo can generate consistent and controllable image animations from static images. It achieves enhanced temporal consistency and smoothness through strategies like learning motion residuals and employing noise refinement techniques, allowing for precise user control over motion intensity.
MasterWeaver can generate photo-realistic images from a single reference image while keeping the person’s identity and allowing for easy edits. It uses an encoder to capture identity features and a unique editing direction loss to improve text control, enabling changes to clothing, accessories, and facial features.
UniTalker can create 3D face animations from speech input! It works better than other tools, making fewer mistakes in lip movements and performing well even with new data it hasn’t seen before.
Shape of Motion can reconstruct 3D scenes from a single video. The method is able to capture the full 3D motion of a scene and can handle occlusions and disocclusions.
MusiConGen can generate music tracks with precise control over rhythm and chords. It allows users to define musical features through symbolic chord sequences, BPM, and text prompts.
IMAGDressing-v1 can generate human try-on images from input garments. It is able to control different scenes through text and can be combined with IP-Adapter and ControlNet pose to enhance the diversity and controllability of generated images.
SparseCtrl is a image-to-video method with some cool new capabilities. With its RGB, depth and sketch encoder and one or few input images, it can animate images, interpolate between keyframes, extend videos as well as guide video generation with only depth maps or a few sketches. Especially in love with how scene transitions look like.
3DWire can generate 3D house wireframes from text! The wireframes can be easily segmented into distinct components, such as walls, roofs, and rooms, reflecting the semantic essence of the shape.
An Object is Worth 64x64 Pixels can generate 3D models from 64x64 pixel images! It creates realistic objects with good shapes and colors, working as well as more complex methods.
AccDiffusion can generate high-resolution images with fewer object repetition! Something Stable Diffusion has been plagued by since its infancy.
Noise Calibration can improve video quality while keeping the original content structure. It uses a noise optimization strategy with pre-trained diffusion models to enhance visuals and ensure consistency between original and enhanced videos.
ST-AVSR can enhance video resolution at any size while keeping details clear and smooth. It uses a pre-trained VGG network to improve quality and speed, making it better than other methods.
Live2Diff can translate live video streams using a special attention method in video diffusion models. It maintains smooth motion by linking each frame to previous ones and can achieve 16 frames per second on an RTX 4090 GPU, making it great for real-time use.
WildGaussians is a new 3D Gaussian Splatting method that can handle occlusions and appearance changes. The method is able to achieve real-time rendering speeds and is able to handle in-the-wild data better than other methods.
Stable Audio Open can generate up to 47 seconds of stereo audio at 44.1kHz from text prompts. It uses a transformer-based diffusion model for high-quality sound, making it useful for artists and researchers.
ColorPeel can generate objects in images with specific colors and shapes.
HumanRefiner can improve human hand and limb quality in images! The method is able to detect and correct issues related to both abnormal human poses.