AI Toolbox
A curated collection of 577 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
You ever tried to inpaint smaller objects and details into an image? Can be kind of a hit or miss. SOEDiff has been specifically trained to handle these cases and can do a pretty good job at it.
Material Anything can generate realistic materials for 3D objects, including those without textures. It adapts to different lighting and uses confidence masks to improve material quality, ensuring outputs are ready for UV mapping.
Inverse Painting can generate time-lapse videos of the painting process from a target artwork. It uses a diffusion-based renderer to learn from real artists’ techniques, producing realistic results across different artistic styles.
MegaFusion can extend existing diffusion models for high-resolution image generation. It achieves images up to 2048x2048 with only 40% of the original computational cost by enhancing denoising processes across different resolutions.
CAT4D can create dynamic 4D scenes from single videos. It uses a multi-view video diffusion model to generate videos from different angles, allowing for strong 4D reconstruction and high-quality images.
SuperMat can quickly break down images of materials into three important maps: albedo, metallic, and roughness. It does this in about 3 seconds while keeping high quality, making it efficient for 3D object material estimation.
SelfSplat can create 3D models from multiple images without needing specific poses. It uses self-supervised methods for depth and pose estimation, resulting in high-quality appearance and geometry from real-world data.
DreamMix is a inpainting method based on the Fooocus model that can add objects from reference images and change their features using text.
Omegance can control detail levels in diffusion-based synthesis using a single parameter, ω. It allows for precise granularity control in generated outputs and enables specific adjustments through spatial masks and denoising schedules.
GarVerseLOD can generate high-quality 3D garment meshes from a single image. It handles complex cloth movements and poses well, using a large dataset of 6,000 garment models to improve accuracy.
UniHair can create 3D hair models from single-view portraits, handling both braided and un-braided styles. It uses a large dataset and advanced techniques to accurately capture complex hairstyles and generalize well to real images.
GaussianAnything can generate high-quality 3D objects from single images or text prompts. It uses a Variational Autoencoder and a cascaded latent diffusion model for effective 3D editing.
FlipSketch can generate sketch animations from static drawings by allowing users to describe the desired motion. It uses motion priors from text-to-video diffusion models to create smooth animations while keeping the original sketch’s look.
SAMURAI combines the SOTA visual video tracking of SAM 2 with motion-aware memory.
Find3D can segment parts of 3D objects based on text queries.
StyleCodes can encode the style of an image into a 20-symbol base64 code for easy sharing and use in image generation. It allows users to create style-reference codes (srefs) from their own images, helping to control styles in diffusion models with high quality.
StableV2V can stabilize shape consistency in video-to-video editing by breaking down the editing process into steps that match user prompts. It handles text-based, image-based, and video inpainting.
SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.
StyleSplat can stylize 3D objects in scenes represented by 3D Gaussians from reference style images. The method is able to localize style transfer to specific objects and supports stylization with multiple styles.
Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.