AI Toolbox

A curated collection of 971 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Brain Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

Thera

Thera can upscale images to super-resolution using with neural heat fields that model a precise point spread function. This method allows for correct anti-aliasing at any output size.

19.03.25 · Project Page · Code · Image Upscaling

DreamRenderer

DreamRenderer extends FLUX with image content control using bounding boxes or masks.

19.03.25 · Project Page · Code · Text-to-Image · Controllable Image Generation

InterMask

InterMask can generate high-quality 3D human interactions from text descriptions. It captures complex movements between two people while also allowing for reaction generation without changing the model.

19.03.25 · Project Page · Code · Motion Generation

Photometric Inverse Rendering

Photometric Inverse Rendering can figure out light positions and reflections in images, including tricky shadows. The method it employs breaks down surface reflections better than other tools, working well on both fake and real pictures.

18.03.25 · Project Page · Code · Image Restoration

Unlock Pose Diversity

KDTalker can generate high-quality talking portraits from a single image and audio input. It captures fine facial details and achieves excellent lip synchronization using a 3D keypoint-based approach and a spatiotemporal diffusion model.

18.03.25 · Code · Demo · Lip Syncing · Talking Head Generation

Diffusion Self-Distillation

Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity and relighting.

18.03.25 · Project Page · Code · Text-to-Image · Personalized Image Generation · Image Relighting

MagicColor

MagicColor can automatically colorize multi-instance sketches while keeping colors consistent across objects using reference images.

17.03.25 · Project Page · Code · Image Colorization · Image Editing

TreeMeshGPT

TreeMeshGPT can generate detailed 3D meshes from point clouds using Autoregressive Tree Sequencing. This technique allows for better mesh detail and achieves a 22% reduction in data size during processing.

17.03.25 · Code · Demo · 3D Mesh Generation

Mobius

Mobius can generate seamlessly looping videos from text descriptions.

16.03.25 · Project Page · Code · Text-to-Video

DART

DART can generate high-quality human motions in real-time, achieving over 300 frames per second on a single RTX 4090 GPU. It combines text inputs with spatial constraints, allowing for tasks like reaching waypoints and interacting with scenes.

15.03.25 · Project Page · Code · Text-to-Motion

MelQCD

MelQCD can create realistic audio tracks that match silent videos. It achieves high quality and synchronization by breaking down mel-spectrograms into different signal types and using a video-to-all (V2X) predictor.

14.03.25 · Project Page · Code · Video-to-Audio

MovieAgent

MovieAgent can generate long-form videos with multiple scenes and shots from a script and character bank. It ensures character consistency and synchronized subtitles while reducing the need for human input in movie production.

13.03.25 · Project Page · Code · Text-to-Video · Video Editing

Make-It-Animatable

Make-It-Animatable can auto-rig any 3D humanoid model for animation in under one second. It generates high-quality blend weights and bones, and works with various 3D formats, ensuring accuracy even for non-standard skeletons.

12.03.25 · Project Page · Code · Demo · 3D Animation

AnCoGen

AnCoGen can analyze and generate speech by estimating key attributes like speaker identity, pitch, and loudness. It can also perform tasks such as speech denoising, pitch shifting, and voice conversion using a unified masked autoencoder model.

11.03.25 · Project Page · Code · Speech Recognition · Text-to-Speech

Chrono

Chrono can track points in videos with an understanding of time.

11.03.25 · Project Page · Code · Video Object Tracking · Video Analysis

VIRES

VIRES can repaint, replace, generate, and remove objects in videos using sketches and text.

10.03.25 · Project Page · Code · Video Editing

Diffusion VAS

Diffusion VAS can generate masks for hidden parts of objects in videos.

07.03.25 · Project Page · Code · Video Object Tracking

6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry

TRG can estimate 6DoF head translations and rotations by leveraging the synergy between facial geometry and head pose.

06.03.25 · Code · Image Object Detection · Image Depth Estimation

StdGEN

StdGEN can generate high-quality 3D characters from a single image in just three minutes. It breaks down characters into parts like body, clothes, and hair, using a transformer-based model for great results in 3D anime character generation.

05.03.25 · Project Page · Code · Image-to-3D · 3D Object Generation · 3D Avatar Generation

Spark-TTS

Spark-TTS can generate customizable voices with control over gender, speaking style, pitch, and rate. It also supports zero-shot voice cloning, allowing smooth language transitions without extra training for each voice.

05.03.25 · Code · Text-to-Speech · Personalized Audio Generation