AI Toolbox
A curated collection of 915 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





HunyuanPortrait can animate characters from a single portrait image by using facial expressions and head poses from video clips. It achieves lifelike animations with high consistency and control, effectively separating appearance and motion.
Custom SVG can generate high-quality SVGs from text prompts with customizable styles.
ObjectCarver can segment, reconstruct, and separate 3D objects from a single view using just user-input clicks, eliminating the need for segmentation masks.
Marigold can estimate depth, predict surface normals, and decompose images with minimal changes.
MTVCrafter can generate high-quality human image animations from 3D motion sequences.
PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.
Skyeyes can generate photorealistic sequences of ground view images from aerial view inputs. It ensures that the images are consistent and realistic, even when there are large gaps in views.
LegoGPT can generate stable and buildable LEGO designs from text prompts. It uses physics-aware techniques to ensure designs are safe for manual assembly and robotic construction, and it can create colored and textured models.
SVAD can generate high-quality 3D avatars from a single image. It keeps the person’s identity and details consistent across different poses and angles while allowing for real-time rendering.
PrimitiveAnything can generate high-quality 3D shapes from 3D models, text and images by breaking down complex forms into simple geometric parts. It uses a shape-conditioned primitive transformer to ensure that the shapes remain accurate and diverse.
PreciseCam can generate images with exact control over camera angles and lens distortions using four simple camera settings.
HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.
FlexiAct can transfer actions from a video to a target image while keeping the person’s identity while adapting to different layouts and viewpoints.
SOAP can generate rigged 3D avatars from a single portrait image.
AnyStory can generate consistent single- and multi-subject images from text.
KeySync can achieve strong lip synchronization for videos. It addresses issues like timing, facial expressions, and blocked faces, using a unique masking strategy and a new metric called LipLeak to improve visual quality.
SwiftSketch can generate high-quality vector sketches from images in under a second. It uses a diffusion model to create editable sketches that work well for different object types and are not limited by resolution.
DiffLocks can generate detailed 3D hair geometry from a single image in 3 seconds.
FantasyTalking can generate talking portraits from a single image, making them look realistic with accurate lip movements and facial expressions. It uses a two-step process to align audio and video, allowing users to control how expressions and body motions appear.
Textoon can generate diverse 2D cartoon characters in the Live2D format from text descriptions. It allows for real-time editing and controllable appearance generation, making it easy for users to create interactive characters.