AI Toolbox

A curated collection of 971 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Brain Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

ID-Patch

ID-Patch can generate personalized group photos by matching faces with specific positions. It reduces problems like identity leakage and visual errors, achieving high accuracy and speed—seven times faster than other methods.

22.04.25 · Project Page · Code · Text-to-Image · Image-to-Image · Personalized Image Generation · Controllable Image Generation

Phantom

Phantom can generate videos that keep the subject’s identity from images while matching them with text prompts.

21.04.25 · Project Page · Code · Text-to-Video · Image-to-Video · Personalized Video Generation

SkyReels-V2

SkyReels-V2 can generate infinite-length videos by combining a Diffusion Forcing framework with Multi-modal Large Language Models and Reinforcement Learning.

20.04.25 · Code · Text-to-Video · Image-to-Video

Shape-Guided Clothing Warping for Virtual Try-On

SCW-VTON can fit in-shop clothing to a person’s image while keeping their pose consistent. It improves the shape of the clothing and reduces distortions in visible limb areas, making virtual try-on results look more realistic.

20.04.25 · Code · Virtual Image Try-On

Ev-DeblurVSR

Ev-DeblurVSR can turn low-resolution and blurry videos into high-resolution ones.

19.04.25 · Project Page · Code · Video Restoration · Video Upscaling

PosterMaker

PosterMaker can generate high-quality product posters by rendering text accurately and keeping the main subject clear.

18.04.25 · Project Page · Code · Image Editing · Image Inpainting · Image-to-Image

FramePack

FramePack aims to make video generation feel like image gen. It can generate single video frames in 1.5 seconds with 13B models on a RTX 4090. Also supports full fps-30 with 13B models using a 6GB laptop GPU, but obviously slower.

18.04.25 · Project Page · Code · Text-to-Video · Image-to-Video

IMAGGarment-1

IMAGGarment-1 can generate high-quality garments with control over shape, color, and logo placement.

18.04.25 · Project Page · Code · Controllable Image Generation

Cobra

Cobra can efficiently colorize line art by utilizing over 200 reference images.

17.04.25 · Project Page · Code · Image Colorization

UniAnimate-DiT

UniAnimate-DiT can generate high-quality animations from human images. It uses the Wan2.1 model and a lightweight pose encoder to create smooth and visually appealing results, while also upscaling animations from 480p to 720p.

16.04.25 · Code · Image-to-Video

CoMotion

CoMotion can detect and track 3D poses of multiple people using just one camera. It works well in crowded places and can keep track of movements over time with high accuracy.

16.04.25 · Code · 3D Object Detection · Motion Capture

PARTFIELD

PartField can segment 3D shapes into parts without using templates or text names.

15.04.25 · Project Page · Code · 3D Segmentation

IP-Composer

IP-Composer can generate compositional images by using multiple input images and natural language prompts.

15.04.25 · Project Page · Code · Image-to-Image · Controllable Image Generation · Image Style Transfer

PhysFlow

PhysFlow can simulate dynamic interactions in complex scenes. It identifies material types through image queries and enhances realism using video diffusion and a Material Point Method for detailed 4D representations.

13.04.25 · Project Page · Code · 3D Scene Generation

Hi3DGen

Hi3DGen can generate high-quality 3D shapes from 2D images. It uses a three-step process to accurately capture fine details, outperforming other methods in realism.

13.04.25 · Project Page · Code · 3D Mesh Generation

HoloPart

HoloPart can break down 3D shapes into complete and meaningful parts, even if they are hidden. It also supports numerous downstream applications such as Geometry Editing, Geometry Processing, Material Editing and Animation.

11.04.25 · Project Page · Code · 3D Segmentation · 3D Object Generation

AniSDF

AniSDF can reconstruct high-quality 3D shapes with improved surface geometry. It can handle complex, luminous, reflective as well as fuzzy objects.

10.04.25 · Project Page · Code · 3D Object Generation

Pixel3DMM

Pixel3DMM can reconstruct 3D human faces from a single RGB image.

10.04.25 · Project Page · Code · 3D Object Generation · 3D Mesh Generation

OmniCaptioner

OmniCaptioner can generate detailed text descriptions for various types of content like images, math formulas, charts, user interfaces, pdfs, videos and more.

10.04.25 · Project Page · Code · Image-to-Text · Image Captioning · Video Captioning

ReCamMaster

ReCamMaster can re-capture videos from new camera angles.

09.04.25 · Project Page · Code · Video Outpainting Video Editing · Controllable Video Generation · Video-to-Video