AI Toolbox
A curated collection of 742 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





GHOST 2.0 is a deepfake method that can transfer heads from one image to another while keeping the skin color and structure intact.
KV-Edit can edit images while keeping the background consistent. It allows users to add, remove, or change objects without needing extra training, ensuring high image quality.
Any2AnyTryon can generate high-quality virtual try-on results by transferring clothes onto images as well as reconstructing garments from real-world images.
NotaGen can generate high-quality classical sheet music.
UniCon can handle different image generation tasks using a single framework. It adapts a pretrained image diffusion model with only about 15% extra parameters and supports most base ControlNet transformations.
MatAnyone can generate stable and high-quality human video matting masks.
SongGen can generate both vocals and accompaniment from text prompts using a single-stage auto-regressive transformer. It allows users to control lyrics, genre, mood, and instrumentation, and offers mixed mode for combined tracks or dual-track mode for separate tracks.
MagicArticulate can rig static 3D models and make them ready for animation. Works on both humanoid and non-humanoid objects.
MEGASAM can estimate camera parameters and depth maps from casual monocular videos.
Step-Video-T2V can generate high-quality videos up to 204 frames long using a 30B parameter text-to-video model.
MIGE can generate images from text prompts and reference images and edit existing images based on instructions.
Cycle3D can generate high-quality and consistent 3D content from a single unposed image. This approach enhances texture consistency and multi-view coherence, significantly improving the quality of the final 3D reconstruction.
LIFe-GoM can create animatable 3D human avatars from sparse multi view images in under 1 second. It renders high-quality images at 95.1 frames per second.
DressRecon can create 3D human body models from single videos. It handles loose clothing and objects well, achieving high-quality results by combining general human shapes with specific video movements.
Dora can generated 3D assets from images which are ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.
Magic 1-For-1 can generate one-minute video clips in just one minute.
VD3D enables camera control for video diffusion models and can transfer the camera trajectory from a reference video.
InstantSwap can swap concepts in images from a reference image while keeping the foreground and background consistent. It uses automated bounding box extraction and cross-attention to make the process more efficient by reducing unnecessary calculations.
Diffusion as Shader can generate high-quality videos from 3D tracking inputs.
MaterialFusion can transfer materials onto objects in images while letting users control how much material is applied.