AI Toolbox
A curated collection of 949 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
 
         
         
         
         
        MotionLab can generate and edit human motion and supports text-based and trajectory-based motion creation.
SMF can transfer 2D or 3D keypoint animations to full-body mesh animations without needing template meshes or corrective keyframes.
ControlFace can edit face images with precise control over pose, expression, and lighting. It uses a dual-branch U-Net architecture and is trained on facial videos to ensure high-quality results while keeping the person’s identity intact.
OmniPhysGS can generate realistic 3D dynamic scenes by modeling objects with Constitutive 3D Gaussians.
GestureLSM can generate real-time co-speech gestures by modeling how different body parts interact.
Imagine360 can generate high-quality 360° videos from monologue single-view videos.
Wonderland can generate high-quality 3D scenes from a single image using a camera-guided video diffusion model. It allows for easy navigation and exploration of 3D spaces, performing better than other methods, especially with images it hasn’t seen before.
DiffSplat can generate 3D Gaussian splats from text prompts and single-view images in 1-2 seconds.
Stable Flow can edit images by adding, removing, or changing objects.
DELTA can track dense 3D motion from single-camera videos with high accuracy. It uses advanced techniques to speed up the process, making it over 8 times faster than older methods while maintaining pixel-level precision.
MoRAG can generate and retrieve human motion from text by improving motion diffusion models.
FramePainter can edit images using simple sketches and video diffusion methods. It allows for realistic changes, like altering reflections or transforming objects, while needing less training data and performing well in different situations.
Yin-Yang can generate music with a clear structure and control over melodies.
One-Prompt-One-Story can generate consistent images from a single text prompt by combining all prompts into one input for text-to-image models.
Video Depth Anything can estimate depth in long videos while keeping a fast speed of 30 frames per second.
Hunyuan3D 2.0 can generate high-resolution textured 3D assets. It allows users to create and animate detailed 3D models efficiently, with improved geometry detail and texture quality compared to previous models.
X-Dyna can animate a single human image by transferring facial expressions and body movements from a video.
ReF-LDM can restore low-quality face images by using multiple high-quality reference images.
RepVideo can improve video generation by making visuals look better and ensuring smooth transitions.
VISION-XL can deblur and upscale videos using SDXL. It supports different aspect ratios and can produce HD videos in under 2.5 minutes on a single NVIDIA 4090 GPU, using only 13GB of VRAM for 25-frame videos.