AI Toolbox
A curated collection of 959 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
FreeTimeGS can reconstruct dynamic 3D scenes in real-time using Gaussian primitives that can appear at different times and places.
KV-Edit can edit images while keeping the background consistent. It allows users to add, remove, or change objects without needing extra training, ensuring high image quality.
Any2AnyTryon can generate high-quality virtual try-on results by transferring clothes onto images as well as reconstructing garments from real-world images.
NotaGen can generate high-quality classical sheet music.
UniCon can handle different image generation tasks using a single framework. It adapts a pretrained image diffusion model with only about 15% extra parameters and supports most base ControlNet transformations.
MatAnyone can generate stable and high-quality human video matting masks.
SongGen can generate both vocals and accompaniment from text prompts using a single-stage auto-regressive transformer. It allows users to control lyrics, genre, mood, and instrumentation, and offers mixed mode for combined tracks or dual-track mode for separate tracks.
MagicArticulate can rig static 3D models and make them ready for animation. Works on both humanoid and non-humanoid objects.
MEGASAM can estimate camera parameters and depth maps from casual monocular videos.
Step-Video-T2V can generate high-quality videos up to 204 frames long using a 30B parameter text-to-video model.
MIGE can generate images from text prompts and reference images and edit existing images based on instructions.
Cycle3D can generate high-quality and consistent 3D content from a single unposed image. This approach enhances texture consistency and multi-view coherence, significantly improving the quality of the final 3D reconstruction.
LIFe-GoM can create animatable 3D human avatars from sparse multi view images in under 1 second. It renders high-quality images at 95.1 frames per second.
DressRecon can create 3D human body models from single videos. It handles loose clothing and objects well, achieving high-quality results by combining general human shapes with specific video movements.
Google DeepMind has been researching 4DiM, a cascaded diffusion model for 4D novel view synthesis. It can generate 3D scenes with temporal dynamics from a single image and a set of camera poses and timestamps.
Dora can generated 3D assets from images which are ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.
Magic 1-For-1 can generate one-minute video clips in just one minute.
VD3D enables camera control for video diffusion models and can transfer the camera trajectory from a reference video.
InstantSwap can swap concepts in images from a reference image while keeping the foreground and background consistent. It uses automated bounding box extraction and cross-attention to make the process more efficient by reducing unnecessary calculations.
Diffusion as Shader can generate high-quality videos from 3D tracking inputs.