AI Toolbox
A curated collection of 610 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
GarVerseLOD can generate high-quality 3D garment meshes from a single image. It handles complex cloth movements and poses well, using a large dataset of 6,000 garment models to improve accuracy.
UniHair can create 3D hair models from single-view portraits, handling both braided and un-braided styles. It uses a large dataset and advanced techniques to accurately capture complex hairstyles and generalize well to real images.
GaussianAnything can generate high-quality 3D objects from single images or text prompts. It uses a Variational Autoencoder and a cascaded latent diffusion model for effective 3D editing.
FlipSketch can generate sketch animations from static drawings by allowing users to describe the desired motion. It uses motion priors from text-to-video diffusion models to create smooth animations while keeping the original sketch’s look.
SAMURAI combines the SOTA visual video tracking of SAM 2 with motion-aware memory.
Find3D can segment parts of 3D objects based on text queries.
StyleCodes can encode the style of an image into a 20-symbol base64 code for easy sharing and use in image generation. It allows users to create style-reference codes (srefs) from their own images, helping to control styles in diffusion models with high quality.
StableV2V can stabilize shape consistency in video-to-video editing by breaking down the editing process into steps that match user prompts. It handles text-based, image-based, and video inpainting.
SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.
StyleSplat can stylize 3D objects in scenes represented by 3D Gaussians from reference style images. The method is able to localize style transfer to specific objects and supports stylization with multiple styles.
Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.
CamI2V is a method which can generate videos from images with precise control over camera movements and text prompts.
MagicQuill enables efficient image editing with a simple interface that lets users easily insert elements and change colors. It uses a large language model to understand editing intentions in real time, improving the quality of the results.
GarmentDreamer can generate wearable, simulation-ready 3D garment meshes from text prompts. The method is able to generate diverse geometric and texture details, making it possible to create a wide range of different clothing items.
JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.
SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup.
CHANGER can integrate an actor’s head onto a target body in digital content. It uses chroma keying for clear backgrounds and enhances blending quality with Head shape and long Hair augmentation (H2 augmentation) and a Foreground Predictive Attention Transformer (FPAT).
Scaling Mesh Generation via Compressive Tokenization can generate high-quality meshes with over 8,000 faces.
DAWN can generate talking head videos from a single portrait and audio clip. It produces lip movements and head poses quickly, making it effective for creating long video sequences.
DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.