AI Toolbox
A curated collection of 556 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.
StyleSplat can stylize 3D objects in scenes represented by 3D Gaussians from reference style images. The method is able to localize style transfer to specific objects and supports stylization with multiple styles.
Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.
CamI2V is a method which can generate videos from images with precise control over camera movements and text prompts.
MagicQuill enables efficient image editing with a simple interface that lets users easily insert elements and change colors. It uses a large language model to understand editing intentions in real time, improving the quality of the results.
GarmentDreamer can generate wearable, simulation-ready 3D garment meshes from text prompts. The method is able to generate diverse geometric and texture details, making it possible to create a wide range of different clothing items.
SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup.
CHANGER can integrate an actor’s head onto a target body in digital content. It uses chroma keying for clear backgrounds and enhances blending quality with Head shape and long Hair augmentation (H2 augmentation) and a Foreground Predictive Attention Transformer (FPAT).
Scaling Mesh Generation via Compressive Tokenization can generate high-quality meshes with over 8,000 faces.
DAWN can generate talking head videos from a single portrait and audio clip. It produces lip movements and head poses quickly, making it effective for creating long video sequences.
DimensionX can generate photorealistic 3D and 4D scenes from a single image using controllable video diffusion.
SG-I2V can control object and camera motion in image-to-video generation using bounding boxes and trajectories
RayGauss can create realistic new views of 3D scenes, using Gaussian-based ray casting! It produces high-quality images quickly, running at 25 frames per second, and avoids common picture problems that older methods had.
CLoSD can control characters in physics-based simulations using text prompts. It can navigate to goals, strike objects, and switch between sitting and standing, all guided by simple instructions.
GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.
Regional-Prompting-FLUX adds regional prompting capabilities to diffusion transformers like FLUX. It effectively manages complex prompts and works well with tools like LoRA and ControlNet.
AutoVFX can automatically create realistic visual effects in videos from a single image and text instructions.
Adaptive Caching can speed up video generation with Diffusion Transformers by caching important calculations. It can achieve up to 4.7 times faster video creation at 720p without losing quality.
ZIM can generate precise matte masks from segmentation labels, enabling zero-shot image matting.
Face Anon can anonymize faces in images while keeping original facial expressions and head positions. It uses diffusion models to achieve high-quality image results and can also perform face swapping tasks.