AI Toolbox

A curated collection of 950 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

LongSplat

LongSplat can create high-quality 3D scenes from long videos without needing camera positions.

20.08.25 · Project Page · Code · 3D Scene Generation

MyTimeMachine

MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.

18.08.25 · Project Page · Code · Video Personalization · Video Editing

HOIDiNi

HOIDiNi can generate realistic human-object interactions with accurate hand contact and natural body movement from text prompts.

16.08.25 · Project Page · Project Page · Code · Text-to-Motion

SewingLDM

SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.

16.08.25 · Project Page · Code · Text-to-Image · Image-to-Image · 3D Object Generation

AnimateAnyMesh

AnimateAnyMesh can animate 3D meshes based on text prompts.

14.08.25 · Project Page · Code · 3D Animation

PERSONA

PERSONA can create personalized 3D avatars from a single image, allowing for realistic animations that reflect the subject’s identity.

14.08.25 · Project Page · Code · 3D Avatar Generation

Matrix-3D

Matrix-3D can generate 3D worlds from a single image or text prompt. It allows users to explore these environments in any direction and supports both quick and detailed scene creation.

12.08.25 · Project Page · Code · 3D Scene Generation · Text-to-3D · Image-to-3D

FantasyPortrait

FantasyPortrait can generate high-quality animations from static images for both single and multi-character scenes.

12.08.25 · Project Page · Code · Controllable Video Generation · Talking Head Generation

MonetGPT

MonetGPT can critique photos and suggest retouching edits. It explains each adjustment clearly, helps keep the subject’s identity, and allows for personalized editing plans.

12.08.25 · Project Page · Code · Image Editing

WIR3D

WIR3D can abstract 3D shapes to enable easy shape changes.

08.08.25 · Project Page · Code · 3D Editing

Sketch2Anim

Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.

06.08.25 · Project Page · Code · Motion Generation

Qwen-Image

Qwen-Image can generate high-quality images and edit them in advanced ways. It can transfer styles, manipulate objects, and edit text in images, while also handling complex text rendering in multiple languages.

06.08.25 · Code · Demo · Model · Text-to-Image · Image-to-Image · Image Editing

ShoulderShot

ShoulderShot can generate over-the-shoulder dialogue videos that keep characters looking the same and maintain a smooth flow between shots. It allows for longer conversations and offers more flexibility in how shots are arranged.

05.08.25 · Project Page · Text-to-Video

SDMatte

SDMatte can extract objects from images using visual prompts like points, boxes, and masks.

04.08.25 · Code · Image Segmentation

Event-Driven Storytelling

Event-Driven Storytelling can generate realistic movements for multiple characters in a 3D scene. It uses a large language model to understand complex interactions, allowing for diverse and scalable behavior planning based on character relationships and their positions.

04.08.25 · Project Page · Code · Text-to-3D · 3D Scene Generation

DPoser-X

DPoser-X can generate and complete 3D whole-body human poses using a diffusion-based model.

04.08.25 · Project Page · Code · 3D Object Generation

VideoColorGrading

VideoColorGrading can generate a look-up table (LUT) for matching colors between reference scenes and input videos.

04.08.25 · Code · Video Colorization

SyncTalk++

SyncTalk++ can generate high-quality talking head videos with synchronized lip movements and facial expressions. It uses Gaussian Splatting for consistent subject identity and can render up to 101 frames per second.

03.08.25 · Project Page · Code · Talking Head Generation

MVPaint

MVPaint can generate high-resolution, seamless textures for 3D models. It uses a three-stage process for better texture quality, including multi-view generation and UV refinement to reduce visible seams.

29.07.25 · Project Page · Code · 3D Texture Generation

Subsurface Scattering for 3D Gaussian Splatting

Subsurface Scattering for Gaussian Splatting can render and relight translucent objects in real time. It allows for detailed material editing and achieves high visual quality at around 150 FPS.

27.07.25 · Project Page · Code · 3D Object Generation 3D Editing 3D Relighting