AI Toolbox

A curated collection of 950 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

PUSA V1.0

Pusa V1.0 can generate high-quality videos from images and text prompts. It achieves a VBench-I2V score of 87.32% with only $500 in training costs and supports features like video transitions and extensions.

23.07.25 · Project Page · Code · Model · Image-to-Video · Text-to-Video

Symmetry Strikes Back

Reflect3D can detect 3D reflection symmetry from a single RGB image and improve 3D generation.

21.07.25 · Project Page · Code · Image-to-3D · 3D Object Generation

GlobalPose

GlobalPose can capture human motion in 3D space using 6 IMUs (Inertial Measurement Unit). It accurately reconstructs global motions and local poses while estimating 3D contacts and forces.

20.07.25 · Project Page · Code · Motion Capture

PhysX

PhysX can generate 3D assets with detailed physical properties, which labels assets in five key areas: scale, material, affordance, kinematics, and function.

20.07.25 · Project Page · Code · Text-to-3D · Image-to-3D · 3D Object Generation

ACTalker

ACTalker can generate talking head videos by combining audio and facial motion to control specific facial areas.

19.07.25 · Project Page · Code · Talking Head Generation

SpatialTrackerV2

SpatialTrackerV2 can track 3D points in videos using a single system for point tracking, depth, and camera position.

17.07.25 · Project Page · Code · Demo · 3D Object Detection

CharaConsist

CharaConsist built on top of FLUX.1 can generate consistent characters in text-to-image sequences.

16.07.25 · Project Page · Code · Text-to-Image · Personalized Image Generation

UltraZoom

UltraZoom can create gigapixel-resolution images from regular photos by upscaling them with detailed close-ups.

15.07.25 · Project Page · Code · Image Upscaling · Image Restoration

Human-Object Interaction from Human-Level Instructions

HOIFH generates synchronized object motion, full-body human motion, and detailed finger motion. It is designed for manipulating large objects within contextual environments, guided by human-level instructions.

15.07.25 · Project Page · Code · Text-to-Motion

Subject-Consistent and Pose-Diverse Text-to-Image Generation

CoDi can generate images that keep the same subject across different poses and layouts.

14.07.25 · Project Page · Code · Text-to-Image · Personalized Image Generation

OSDFace

OSDFace can restore low-quality face images in one step, making it faster than traditional methods. It produces high-quality images while keeping the person’s identity consistent.

14.07.25 · Project Page · Code · Image Restoration

CODiff

CODiff can remove severe JPEG artifacts from highly compressed images. It uses a one-step diffusion process and a compression-aware visual embedder (CaVE) to improve image quality.

13.07.25 · Code · Image Restoration

GeoSplatting

GeoSplatting can capture detailed 3D shapes and realistic materials and lighting.

11.07.25 · Project Page · Code · 3D Object Generation · 3D Relighting

Add-it

Add-it can add objects to images based on text prompts without extra training. It uses a smart attention system for natural placement and consistency, achieving top results in image insertion tasks.

11.07.25 · Project Page · Code · Image Editing

Tora

Tora can generate high-quality videos with precise control over motion trajectories by integrating textual, visual, and trajectory conditions. It achieves high motion fidelity and allows for diverse video durations, aspect ratios, and resolutions, making it a versatile tool for video generation.

09.07.25 · Project Page · Code · Text-to-Video

Tora2

Tora2 can generate videos with customized motion and appearance for multiple entities.

09.07.25 · Project Page · Code · Controllable Video Generation

Hear-Your-Click

Hear-Your-Click can generate specific sounds for objects in videos when users click on them. It improves the connection between sound and visuals, allowing for precise audio that matches user-selected objects.

08.07.25 · Code · Video-to-Audio

ObjectClear

ObjectClear can remove objects from images while also getting rid of shadows and reflections. It uses an object-effect attention mechanism to improve how well it removes foregrounds and keeps backgrounds, making it much better than other methods, especially in complex scenes.

07.07.25 · Project Page · Code · Image Editing · Image Inpainting

SketchSeg

SketchSeg can segment raster sketches into layers, making it easy for artists to move, copy, or delete objects.

06.07.25 · Project Page · Code · Image Segmentation

ReFlex

ReFlex can change the high-level features of an image based on a text prompt while keeping its main structure.

03.07.25 · Project Page · Code · Image Editing