AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Matrix-3D can generate 3D worlds from a single image or text prompt. It allows users to explore these environments in any direction and supports both quick and detailed scene creation.
FantasyPortrait can generate high-quality animations from static images for both single and multi-character scenes.
MonetGPT can critique photos and suggest retouching edits. It explains each adjustment clearly, helps keep the subject’s identity, and allows for personalized editing plans.
WIR3D can abstract 3D shapes to enable easy shape changes.
Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.
Qwen-Image can generate high-quality images and edit them in advanced ways. It can transfer styles, manipulate objects, and edit text in images, while also handling complex text rendering in multiple languages.
ShoulderShot can generate over-the-shoulder dialogue videos that keep characters looking the same and maintain a smooth flow between shots. It allows for longer conversations and offers more flexibility in how shots are arranged.
SDMatte can extract objects from images using visual prompts like points, boxes, and masks.
Event-Driven Storytelling can generate realistic movements for multiple characters in a 3D scene. It uses a large language model to understand complex interactions, allowing for diverse and scalable behavior planning based on character relationships and their positions.
DPoser-X can generate and complete 3D whole-body human poses using a diffusion-based model.
VideoColorGrading can generate a look-up table (LUT) for matching colors between reference scenes and input videos.
SyncTalk++ can generate high-quality talking head videos with synchronized lip movements and facial expressions. It uses Gaussian Splatting for consistent subject identity and can render up to 101 frames per second.
MVPaint can generate high-resolution, seamless textures for 3D models. It uses a three-stage process for better texture quality, including multi-view generation and UV refinement to reduce visible seams.
Subsurface Scattering for Gaussian Splatting can render and relight translucent objects in real time. It allows for detailed material editing and achieves high visual quality at around 150 FPS.
Pusa V1.0 can generate high-quality videos from images and text prompts. It achieves a VBench-I2V score of 87.32% with only $500 in training costs and supports features like video transitions and extensions.
Reflect3D can detect 3D reflection symmetry from a single RGB image and improve 3D generation.
GlobalPose can capture human motion in 3D space using 6 IMUs (Inertial Measurement Unit). It accurately reconstructs global motions and local poses while estimating 3D contacts and forces.
PhysX can generate 3D assets with detailed physical properties, which labels assets in five key areas: scale, material, affordance, kinematics, and function.
ACTalker can generate talking head videos by combining audio and facial motion to control specific facial areas.
SpatialTrackerV2 can track 3D points in videos using a single system for point tracking, depth, and camera position.