AI Toolbox

A curated collection of 936 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

Chrono

Chrono can track points in videos with an understanding of time.

11.03.25 · Project Page · Code · Video Object Tracking · Video Analysis

VIRES

VIRES can repaint, replace, generate, and remove objects in videos using sketches and text.

10.03.25 · Project Page · Code · Video Editing

Diffusion VAS

Diffusion VAS can generate masks for hidden parts of objects in videos.

07.03.25 · Project Page · Code · Video Object Tracking

6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry

TRG can estimate 6DoF head translations and rotations by leveraging the synergy between facial geometry and head pose.

06.03.25 · Code · Image Object Detection · Image Depth Estimation

StdGEN

StdGEN can generate high-quality 3D characters from a single image in just three minutes. It breaks down characters into parts like body, clothes, and hair, using a transformer-based model for great results in 3D anime character generation.

05.03.25 · Project Page · Code · Image-to-3D · 3D Object Generation · 3D Avatar Generation

Spark-TTS

Spark-TTS can generate customizable voices with control over gender, speaking style, pitch, and rate. It also supports zero-shot voice cloning, allowing smooth language transitions without extra training for each voice.

05.03.25 · Code · Text-to-Speech · Personalized Audio Generation

3D-GPT

So far it has been tough to imagine the benefits of AI agents. Most of what we’ve seen from that domain has been focused on NPC simulations or solving text-based goals. 3D-GPT is a new framework that utilizes LLMs for instruction-driven 3D modeling by breaking down 3D modeling tasks into manageable segments to procedurally generate 3D scenes. I recently started to dig into Blender and I pray this gets open sourced at one point.

05.03.25 · Project Page · Code · 3D Object Generation · 3D Scene Generation

VideoMaker

VideoMaker can generate personalized videos from a single subject reference image.

04.03.25 · Project Page · Code · Personalized Video Generation · Controllable Video Generation · Text-to-Video

Generative Photography

Generative Photography can generate consistent images from text with an understanding of camera physics. The method can control camera settings like bokeh and color temperatures to create consistent images with different effects.

04.03.25 · Project Page · Code · Text-to-Image

Dream Engine

Dream Engine can generate images by combining different concepts from reference images.

04.03.25 · Code · Text-to-Image · Personalized Image Generation

ImageRAG

ImageRAG can find relevant images based on a text prompt to improve image generation. It helps create rare and detailed concepts without needing special training, making it useful for different image models.

03.03.25 · Project Page · Code · Text-to-Image · Image Editing · Personalized Image Generation

InsTaG

InsTaG can generate realistic 3D talking heads from just a few seconds of video.

03.03.25 · Project Page · Code · Talking Head Generation

Phidias

Phidias can generate high-quality 3D assets from text, images, and 3D references. It uses a method called reference-augmented diffusion to improve quality and speed, achieving results in just a few seconds.

02.03.25 · Project Page · Code · Text-to-3D · Image-to-3D

EventEgo3D++

EventEgo3D++ can capture 3D human motion using a monocular event camera with a fisheye lens. It works well in low-light and high-speed conditions, providing real-time 3D pose updates at 140Hz with high accuracy compared to RGB-based methods.

01.03.25 · Project Page · Code · 3D Motion Capture

D-NPC

Cyberpunk brain dances are becoming a thing! D-NPC can turn videos into dynamic neural point clouds aka 4D scenes which makes it possible to watch a scene from another perspective.

28.02.25 · Project Page · Code · - 3D Scene Generation - 3D Object Generation

Distill Any Depth

Distill Any Depth can generate depth maps from images.

27.02.25 · Project Page · Code · Image Depth Estimation · Image-to-Depth

GHOST 2.0

GHOST 2.0 is a deepfake method that can transfer heads from one image to another while keeping the skin color and structure intact.

26.02.25 · Code · Image Inpainting · Image-to-Image

FreeTimeGS

FreeTimeGS can reconstruct dynamic 3D scenes in real-time using Gaussian primitives that can appear at different times and places.

26.02.25 · Project Page · Code · 3D Object Generation · 3D Scene Generation

KV-Edit

KV-Edit can edit images while keeping the background consistent. It allows users to add, remove, or change objects without needing extra training, ensuring high image quality.

25.02.25 · Project Page · Code · Image Editing · Image Inpainting

Any2AnyTryon

Any2AnyTryon can generate high-quality virtual try-on results by transferring clothes onto images as well as reconstructing garments from real-world images.

25.02.25 · Project Page · Code · Virtual Image Try-On · Image Editing · Image Segmentation