AI Toolbox
A curated collection of 910 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





CanonSwap can transfer identities from images to videos while keeping natural movements like head poses and facial expressions.
Hunyuan-GameCraft can generate interactive game videos by combining keyboard and mouse inputs into a shared camera view.
MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.
HOIDiNi can generate realistic human-object interactions with accurate hand contact and natural body movement from text prompts.
SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.
AnimateAnyMesh can animate 3D meshes based on text prompts.
PERSONA can create personalized 3D avatars from a single image, allowing for realistic animations that reflect the subject’s identity.
Matrix-3D can generate 3D worlds from a single image or text prompt. It allows users to explore these environments in any direction and supports both quick and detailed scene creation.
FantasyPortrait can generate high-quality animations from static images for both single and multi-character scenes.
MonetGPT can critique photos and suggest retouching edits. It explains each adjustment clearly, helps keep the subject’s identity, and allows for personalized editing plans.
WIR3D can abstract 3D shapes to enable easy shape changes.
Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.
Qwen-Image can generate high-quality images and edit them in advanced ways. It can transfer styles, manipulate objects, and edit text in images, while also handling complex text rendering in multiple languages.
ShoulderShot can generate over-the-shoulder dialogue videos that keep characters looking the same and maintain a smooth flow between shots. It allows for longer conversations and offers more flexibility in how shots are arranged.
SDMatte can extract objects from images using visual prompts like points, boxes, and masks.
Event-Driven Storytelling can generate realistic movements for multiple characters in a 3D scene. It uses a large language model to understand complex interactions, allowing for diverse and scalable behavior planning based on character relationships and their positions.
DPoser-X can generate and complete 3D whole-body human poses using a diffusion-based model.
VideoColorGrading can generate a look-up table (LUT) for matching colors between reference scenes and input videos.
SyncTalk++ can generate high-quality talking head videos with synchronized lip movements and facial expressions. It uses Gaussian Splatting for consistent subject identity and can render up to 101 frames per second.
MVPaint can generate high-resolution, seamless textures for 3D models. It uses a three-stage process for better texture quality, including multi-view generation and UV refinement to reduce visible seams.