AI Toolbox

A curated collection of 863 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

GeoSplatting

GeoSplatting can capture detailed 3D shapes and realistic materials and lighting.

11.07.25 · Project Page · Code · 3D Object Generation · 3D Relighting

Add-it

Add-it can add objects to images based on text prompts without extra training. It uses a smart attention system for natural placement and consistency, achieving top results in image insertion tasks.

11.07.25 · Project Page · Code · Image Editing

Tora

Tora can generate high-quality videos with precise control over motion trajectories by integrating textual, visual, and trajectory conditions. It achieves high motion fidelity and allows for diverse video durations, aspect ratios, and resolutions, making it a versatile tool for video generation.

09.07.25 · Project Page · Code · Text-to-Video

Tora2

Tora2 can generate videos with customized motion and appearance for multiple entities.

09.07.25 · Project Page · Code · Controllable Video Generation

Hear-Your-Click

Hear-Your-Click can generate specific sounds for objects in videos when users click on them. It improves the connection between sound and visuals, allowing for precise audio that matches user-selected objects.

08.07.25 · Code · Video-to-Audio

ObjectClear

ObjectClear can remove objects from images while also getting rid of shadows and reflections. It uses an object-effect attention mechanism to improve how well it removes foregrounds and keeps backgrounds, making it much better than other methods, especially in complex scenes.

07.07.25 · Project Page · Code · Image Editing · Image Inpainting

SketchSeg

SketchSeg can segment raster sketches into layers, making it easy for artists to move, copy, or delete objects.

06.07.25 · Project Page · Code · Image Segmentation

ReFlex

ReFlex can change the high-level features of an image based on a text prompt while keeping its main structure.

03.07.25 · Project Page · Code · Image Editing

LongAnimation

LongAnimation can create long-term animations with consistent colors.

02.07.25 · Project Page · Code · Sketch-to-Video · Video Colorization

Depth Anything at Any Condition

Depth Anything at Any Condition can estimate depth from a single image in different lighting and weather conditions.

02.07.25 · Project Page · Code · Image Depth Estimation · Video Depth Estimation

SketchColour

SketchColour can turn 2D animation sketches into fully colored frames.

01.07.25 · Project Page · Code · Image Colorization · Image-to-Image

Calligrapher

Calligrapher can customize text images with artistic typography and a style injection framework.

01.07.25 · Project Page · Code · Image Editing

SMS

SMS is a method for image stylization with diffusion models. Balancing effective style transfer with content preservation is a long-standing challenge.

30.06.25 · Project Page · Code · Image Style Transfer · Image Editing

METEOR

METEOR can generate orchestral music while allowing control over the texture of the accompaniment. It achieves high-quality music style transfer and lets users adjust melodies and textures at the bar and track levels.

30.06.25 · Project Page · Code · Text-to-Music

ReferDINO

ReferDINO can segment objects in videos using text descriptions. It improves accuracy with a special mask decoder and enhances understanding of movement over time.

27.06.25 · Project Page · Code · Video Object Detection

XVerse

XVerse can create high-quality images with multiple subjects that can be edited. It allows precise control over each subject’s pose, style, and lighting, while also reducing issues like attribute entanglement and artifacts.

27.06.25 · Project Page · Code · Text-to-Image · Image Editing · Controllable Image Generation

ThinkSound

ThinkSound can generate sound from video either with a caption or Chain-of-Thought.

26.06.25 · Project Page · Code · Demo · Video-to-Audio

Matrix-Game

Matrix-Game can generate high-quality interactive game worlds in Minecraft.

24.06.25 · Project Page · Code · Image-to-Video

OmniAvatar

OmniAvatar can generate lifelike full-body avatar videos from audio. It offers accurate lip-syncing and natural movements, and allows for precise control over emotions and backgrounds.

24.06.25 · Project Page · Code · Audio-to-Video · Talking Head Generation · Lip Syncing

GaVS

GaVS can stabilize videos by reconstructing and rendering them in 3D.

24.06.25 · Project Page · Code · Video Restoration