AI Toolbox
A curated collection of 959 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Gaussian-Informed Continuum for Physical Property Identification and Simulation can recover 3D objects from Gaussian point sets and simulate their physical properties.
StructLDM can generate animatable compositional humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc. AI girlfriends/boyfriends are definitely gonna be a thing.
RodinHD can generate high-fidelity 3D avatars from a portrait image. The method is able to capture intricate details such as hairstyles and can generalize to in-the-wild portrait input.
TeFF is a similar method to SphereHead, this one supports more than just human faces and can reconstruct a 3D object from the 360 view of a single image.
ProCreate boosts the diversity and creativity of diffusion-based image generation while avoiding the replication of training data. By pushing generated image embeddings away from reference images, it improves the quality of samples and lowers the risk of copying copyrighted content.
AudioEditor can edit audio by adding, deleting, and replacing segments while keeping unedited parts intact. It uses a pretrained diffusion model with methods like Null-text Inversion and EOT-suppression to ensure high-quality results.
MaskedMimic can generate diverse motions for interactive characters using a physics-based controller. It supports various inputs like keyframes and text, allowing for smooth transitions and adaptation to complex environments.
Prompt Sliders can control and edit concepts in diffusion models. It allows users to adjust the strength of concepts with just 3KB of storage per embedding, making it much faster than traditional LoRA methods.
3D-Fauna is able to turn a single image of a quadruped animal into an articulated, textured 3D mesh in a feed-forward manner, ready for animation and rendering.
PhysGen can generate realistic videos from a single image and user-defined conditions, like forces and torques. It combines physical simulation with video generation, allowing for precise control over dynamics.
StoryMaker can generate a series of images with consistent characters across multiple images. It keeps the same facial features, clothing, hairstyles, and body types, allowing for cohesive storytelling.
PortraitGen can edit portrait videos using multimodal prompts while keeping the video smooth and consistent. It renders over 100 frames per second and supports various styles like text-driven and relighting, ensuring high quality and temporal consistency.
WiLoR can localize and reconstruct multiple hands in real-time from single images. It achieves smooth 3D hand tracking with high accuracy, using a large dataset of over 2 million hand images.
PhysAvatar can turn multi-view videos into high-quality 3D avatars with loose-fitting clothes. The whole thing can be animated and generalizes well to unseen motions and lighting conditions.
InstantDrag can edit images quickly using drag instructions without needing masks or text prompts. It learns motion dynamics with a two-network system, allowing for real-time, photo-realistic editing.
3DTopia-XL can generate high-quality 3D PBR assets from text or image inputs in just 5 seconds.
Exploiting Diffusion Prior for Real-World Image Super-Resolution can restore high-quality images from low-resolution inputs using pre-trained text-to-image diffusion models. It allows users to balance image quality and fidelity through a controllable feature wrapping module and adapts to different image resolutions with a progressive aggregation sampling strategy.
Upscale-A-Video can upscale low-resolution videos using text prompts while keeping the video stable. It allows users to adjust noise levels for better quality and performs well in both test and real-world situations.
ExAvatar can animate expressive whole-body 3D human avatars from a short monocular video. It captures facial expressions, hand motions, and body poses in the process.
Disentangled Clothed Avatar Generation from Text Descriptions can create high-quality 3D avatars by separately modeling human bodies and clothing. This method improves texture and geometry quality and aligns well with text prompts, enhancing virtual try-on and character animation.