AI Toolbox

A curated collection of 948 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

GEN3C

GEN3C can generate photorealistic videos from single or sparse-view images while keeping camera control and 3D consistency.

10.06.25 · Project Page · Code · Image-to-Video

LinGen

LinGen can generate high-resolution minute-length videos on a single GPU.

09.06.25 · Project Page · Code · Text-to-Video · Video Generation

Let Them Talk

MultiTalk can generate videos of multiple people talking by using audio from different sources, a reference image, and a prompt.

09.06.25 · Project Page · Code · Audio-to-Video · Talking Head Generation

D3-Human

D3-Human can reconstruct detailed 3D human figures from single videos. It separates clothing and body shapes, handles occlusions well, and is useful for clothing transfer and animation.

09.06.25 · Project Page · Code · 3D Avatar Generation

Any-to-Bokeh

Any-to-Bokeh can turn videos into bokeh effects that show depth and focus.

09.06.25 · Project Page · Code · Video Editing

MeshArt

MeshArt can generate 3D meshes with clean shapes.

09.06.25 · Project Page · Code · 3D Object Generation · 3D Mesh Generation

Disco4D

Disco4D can generate and animate 4D human models from a single image by separating clothing from the body. It uses diffusion models for detailed 3D representations and can model parts that are not visible in the input image.

09.06.25 · Project Page · Code · 3D Avatar Generation

GCC

GCC can inpaint color checkers into images to improve lighting and color accuracy.

08.06.25 · Project Page · Code · Image Inpainting · Image Colorization

RepText

RepText can render multilingual visual text in user-chosen fonts without needing to understand the text. It allows for customization of text content, font, and position.

07.06.25 · Project Page · Code · Image Generation · Text-to-Image

ContentV

ContentV can generate high-quality videos from text prompts in various resolutions and lengths.

06.06.25 · Project Page · Code · Text-to-Video

MARBLE

MARBLE can blend and change the material properties of objects in images using material embeddings in CLIP-space. It allows control over attributes like roughness, metallic, transparency, and glow, enabling multiple edits at once and supporting various artistic styles.

04.06.25 · Project Page · Code · Image Editing

Synergizing Motion and Appearance

Synergizing Motion and Appearance can generate high-quality talking head videos by combining facial identity from a source image with motion from a driving video.

03.06.25 · Project Page · Code · Video-to-Video · Controllable Video Generation · Image-to-Video · Talking Head Generation

Triangle Splatting for Real-Time Radiance Field Rendering

After NeRFs and Gaussian Splatting we got Triangle Splatting. A new method that can render real-time radiance fields at over 2,400 FPS with a 1280x720 resolution. It combines triangle representations with differentiable rendering for better visual quality and faster results than Gaussian splatting methods.

03.06.25 · Project Page · Code · 3D Scene Generation

UniTEX

UniTEX can generate high-quality textures for 3D assets without using UV mapping. It maps 3D points to texture values based on surface proximity and uses a transformer-based model for better texture quality.

03.06.25 · Code · Image-to-Texture · 3D Texture Generation · Image-to-3D

Generative Omnimatte

Generative Omnimatte can break down videos into meaningful layers, isolating objects, shadows, and reflections without needing static backgrounds. It uses a video diffusion model for high-quality results and can fill in hidden areas, enhancing video editing options.

03.06.25 · Project Page · Code · Video Inpainting · Video Editing

Direct3D-S2

Direct3D-S2 can generate high-resolution 3D shapes.

30.05.25 · Project Page · Code · Image-to-3D

MiniMax-Remover

MiniMax-Remover can remove objects from videos efficiently with just 6 sampling steps.

30.05.25 · Project Page · Code · · Video Object Detection · Video Editing

EPiC

EPiC can control video cameras in image-to-video and video-to-video tasks without needing many camera path details.

29.05.25 · Project Page · Code · Video-to-Video · Controllable Video Generation

SceneFactor

SceneFactor generates 3D scenes from text using an intermediate 3D semantic map. This map can be edited to add, remove, resize, and replace objects, allowing for easy regeneration of the final 3D scene.

28.05.25 · Project Page · Code · Text-to-3D · 3D Editing · 3D Scene Generation

RenderFormer

RenderFormer can render images from triangle mesh representations with full global illumination effects.

28.05.25 · Project Page · Code · 3D Scene Generation