AI Toolbox

A curated collection of 946 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

PreciseCam

PreciseCam can generate images with exact control over camera angles and lens distortions using four simple camera settings.

07.05.25 · Project Page · Code · Text-to-Image · Controllable Image Generation · Image Editing

HunyuanCustom

HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.

07.05.25 · Project Page · Code · Text-to-Video · Audio-to-Video · Video-to-Video · Personalized Video Generation

FlexiAct

FlexiAct can transfer actions from a video to a target image while keeping the person’s identity while adapting to different layouts and viewpoints.

07.05.25 · Project Page · Code · Video-to-Video

SOAP

SOAP can generate rigged 3D avatars from a single portrait image.

05.05.25 · Project Page · Code · 3D Avatar Generation

AnyStory

AnyStory can generate consistent single- and multi-subject images from text.

30.04.25 · Project Page · Code · Text-to-Image · Personalized Image Generation

KeySync

KeySync can achieve strong lip synchronization for videos. It addresses issues like timing, facial expressions, and blocked faces, using a unique masking strategy and a new metric called LipLeak to improve visual quality.

30.04.25 · Project Page · Code · Demo · Lip Syncing

SwiftSketch

SwiftSketch can generate high-quality vector sketches from images in under a second. It uses a diffusion model to create editable sketches that work well for different object types and are not limited by resolution.

29.04.25 · Project Page · Code · Image-to-Sketch

DiffLocks

DiffLocks can generate detailed 3D hair geometry from a single image in 3 seconds.

29.04.25 · Project Page · Code · 3D Object Generation

FantasyTalking

FantasyTalking can generate talking portraits from a single image, making them look realistic with accurate lip movements and facial expressions. It uses a two-step process to align audio and video, allowing users to control how expressions and body motions appear.

28.04.25 · Project Page · Code · Talking Head Generation · Lip Syncing

Textoon

Textoon can generate diverse 2D cartoon characters in the Live2D format from text descriptions. It allows for real-time editing and controllable appearance generation, making it easy for users to create interactive characters.

27.04.25 · Project Page · Code · 3D Avatar Generation

S3D

S3D can generate 3D models from simple hand-drawn sketches.

26.04.25 · Code · Sketch-to-3D

GPS-Gaussian+

GPS-Gaussian+ can render high-resolution 3D scenes from 2 or more input images in real-time.

25.04.25 · Project Page · Code · 3D Scene Generation

Step1X-Edit

Step1X-Edit can perform advanced image editing tasks by processing reference images and user instructions.

25.04.25 · Code · Image Editing

Describe Anything

[Describe Anything] can generate detailed descriptions for specific areas in images and videos using points, boxes, scribbles, or masks. It produces context-aware captions that highlight subtle details and changes over time, achieving top performance on seven benchmarks for localized captioning.

24.04.25 · Project Page · Code · Image Captioning · Video Captioning

SwiftBrush v2

SwiftBrush v2 can improve the quality of images generated by one-step text-to-image diffusion models. Results look great, and apparently it ranks better than all GAN-based and multi-step Stable Diffusion models in benchmarks. No code though 🤷‍♂️

24.04.25 · Project Page · Code · Text-to-Image

InstantCharacter

InstantCharacter can generate high-quality images of personalized characters from a single reference image with FLUX. It supports different styles and poses, ensuring identity consistency and allowing for text-based edits.

22.04.25 · Project Page · Code · Demo · Personalized Image Generation

TAPIP3D

TAPIP3D can track 3D points in videos.

22.04.25 · Project Page · Code · 3D Point Tracking

ID-Patch

ID-Patch can generate personalized group photos by matching faces with specific positions. It reduces problems like identity leakage and visual errors, achieving high accuracy and speed—seven times faster than other methods.

22.04.25 · Project Page · Code · Text-to-Image · Image-to-Image · Personalized Image Generation · Controllable Image Generation

Phantom

Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.

21.04.25 · Project Page · Code · Text-to-Video · Image-to-Video · Personalized Video Generation

SkyReels-V2

SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.

20.04.25 · Code · Text-to-Video · Image-to-Video