AI Toolbox
A curated collection of 915 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





Diffuman4D can generate high-quality, 4D-consistent videos of human performances from just a few input videos. It uses a spatio-temporal diffusion model to improve the quality of the videos, making them more realistic and consistent than other methods.
InstantRestore can restore badly damaged face images in near real-time. It uses a single-step image diffusion model and a small set of reference images to keep the person’s identity.
ByteDance published a new low-step method called PeRFlow which accelerates diffusion models like Stable Diffusion to generate images faster. PeRFlow is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines such as ControlNet, Wonder3D and more.
3DHM can animate people with 3D camera control from a single image and a given target video motion sequence.
3DV-TON can generate high-quality videos for trying on clothes using 3D models. It handles complex clothing patterns and different body poses well, and it has a strong masking method to reduce errors.
CanonSwap can transfer identities from images to videos while keeping natural movements like head poses and facial expressions.
Hunyuan-GameCraft can generate interactive game videos by combining keyboard and mouse inputs into a shared camera view.
MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.
HOIDiNi can generate realistic human-object interactions with accurate hand contact and natural body movement from text prompts.
SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.
AnimateAnyMesh can animate 3D meshes based on text prompts.
PERSONA can create personalized 3D avatars from a single image, allowing for realistic animations that reflect the subject’s identity.
Matrix-3D can generate 3D worlds from a single image or text prompt. It allows users to explore these environments in any direction and supports both quick and detailed scene creation.
FantasyPortrait can generate high-quality animations from static images for both single and multi-character scenes.
MonetGPT can critique photos and suggest retouching edits. It explains each adjustment clearly, helps keep the subject’s identity, and allows for personalized editing plans.
WIR3D can abstract 3D shapes to enable easy shape changes.
Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.
Qwen-Image can generate high-quality images and edit them in advanced ways. It can transfer styles, manipulate objects, and edit text in images, while also handling complex text rendering in multiple languages.
ShoulderShot can generate over-the-shoulder dialogue videos that keep characters looking the same and maintain a smooth flow between shots. It allows for longer conversations and offers more flexibility in how shots are arranged.
SDMatte can extract objects from images using visual prompts like points, boxes, and masks.