AI Toolbox
A curated collection of 494 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
ProCreate boosts the diversity and creativity of diffusion-based image generation while avoiding the replication of training data. By pushing generated image embeddings away from reference images, it improves the quality of samples and lowers the risk of copying copyrighted content.
AudioEditor can edit audio by adding, deleting, and replacing segments while keeping unedited parts intact. It uses a pretrained diffusion model with methods like Null-text Inversion and EOT-suppression to ensure high-quality results.
MaskedMimic can generate diverse motions for interactive characters using a physics-based controller. It supports various inputs like keyframes and text, allowing for smooth transitions and adaptation to complex environments.
Prompt Sliders can control and edit concepts in diffusion models. It allows users to adjust the strength of concepts with just 3KB of storage per embedding, making it much faster than traditional LoRA methods.
3D-Fauna is able to turn a single image of a quadruped animal into an articulated, textured 3D mesh in a feed-forward manner, ready for animation and rendering.
PhysGen can generate realistic videos from a single image and user-defined conditions, like forces and torques. It combines physical simulation with video generation, allowing for precise control over dynamics.
StoryMaker can generate a series of images with consistent characters across multiple images. It keeps the same facial features, clothing, hairstyles, and body types, allowing for cohesive storytelling.
PortraitGen can edit portrait videos using multimodal prompts while keeping the video smooth and consistent. It renders over 100 frames per second and supports various styles like text-driven and relighting, ensuring high quality and temporal consistency.
WiLoR can localize and reconstruct multiple hands in real-time from single images. It achieves smooth 3D hand tracking with high accuracy, using a large dataset of over 2 million hand images.
PhysAvatar can turn multi-view videos into high-quality 3D avatars with loose-fitting clothes. The whole thing can be animated and generalizes well to unseen motions and lighting conditions.
InstantDrag can edit images quickly using drag instructions without needing masks or text prompts. It learns motion dynamics with a two-network system, allowing for real-time, photo-realistic editing.
3DTopia-XL can generate high-quality 3D PBR assets from text or image inputs in just 5 seconds.
Exploiting Diffusion Prior for Real-World Image Super-Resolution can restore high-quality images from low-resolution inputs using pre-trained text-to-image diffusion models. It allows users to balance image quality and fidelity through a controllable feature wrapping module and adapts to different image resolutions with a progressive aggregation sampling strategy.
Upscale-A-Video can upscale low-resolution videos using text prompts while keeping the video stable. It allows users to adjust noise levels for better quality and performs well in both test and real-world situations.
ExAvatar can animate expressive whole-body 3D human avatars from a short monocular video. It captures facial expressions, hand motions, and body poses in the process.
Disentangled Clothed Avatar Generation from Text Descriptions can create high-quality 3D avatars by separately modeling human bodies and clothing. This method improves texture and geometry quality and aligns well with text prompts, enhancing virtual try-on and character animation.
MagicMan can generate high-quality 3D images and normal maps of humans from a single photo.
TurboEdit enables fast text-based image editing in just 3-4 diffusion steps! It improves edit quality and preserves the original image by using a shifted noise schedule and a pseudo-guidance approach, tackling issues like visual artifacts and weak edits.
DepthCrafter can generate long high-quality depth map sequences for videos. It uses a three-stage training method with a pre-trained image-to-video diffusion model, achieving top performance in depth estimation for visual effects and video generation.
DreamBeast can generate unique 3D animal assets with different parts. It uses a method from Stable Diffusion 3 to quickly create detailed Part-Affinity maps from various camera views, improving quality while saving computing power.