AI Toolbox
A curated collection of 970 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
VideoMaker can generate personalized videos from a single subject reference image.
Generative Photography can generate consistent images from text with an understanding of camera physics. The method can control camera settings like bokeh and color temperatures to create consistent images with different effects.
Dream Engine can generate images by combining different concepts from reference images.
ImageRAG can find relevant images based on a text prompt to improve image generation. It helps create rare and detailed concepts without needing special training, making it useful for different image models.
InsTaG can generate realistic 3D talking heads from just a few seconds of video.
Phidias can generate high-quality 3D assets from text, images, and 3D references. It uses a method called reference-augmented diffusion to improve quality and speed, achieving results in just a few seconds.
EventEgo3D++ can capture 3D human motion using a monocular event camera with a fisheye lens. It works well in low-light and high-speed conditions, providing real-time 3D pose updates at 140Hz with high accuracy compared to RGB-based methods.
Cyberpunk brain dances are becoming a thing! D-NPC can turn videos into dynamic neural point clouds aka 4D scenes which makes it possible to watch a scene from another perspective.
Distill Any Depth can generate depth maps from images.
GHOST 2.0 is a deepfake method that can transfer heads from one image to another while keeping the skin color and structure intact.
FreeTimeGS can reconstruct dynamic 3D scenes in real-time using Gaussian primitives that can appear at different times and places.
KV-Edit can edit images while keeping the background consistent. It allows users to add, remove, or change objects without needing extra training, ensuring high image quality.
Any2AnyTryon can generate high-quality virtual try-on results by transferring clothes onto images as well as reconstructing garments from real-world images.
NotaGen can generate high-quality classical sheet music.
UniCon can handle different image generation tasks using a single framework. It adapts a pretrained image diffusion model with only about 15% extra parameters and supports most base ControlNet transformations.
MatAnyone can generate stable and high-quality human video matting masks.
SongGen can generate both vocals and accompaniment from text prompts using a single-stage auto-regressive transformer. It allows users to control lyrics, genre, mood, and instrumentation, and offers mixed mode for combined tracks or dual-track mode for separate tracks.
MagicArticulate can rig static 3D models and make them ready for animation. Works on both humanoid and non-humanoid objects.
MEGASAM can estimate camera parameters and depth maps from casual monocular videos.
Step-Video-T2V can generate high-quality videos up to 204 frames long using a 30B parameter text-to-video model.