AI Toolbox

A curated collection of 950 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

SceneFactor

SceneFactor generates 3D scenes from text using an intermediate 3D semantic map. This map can be edited to add, remove, resize, and replace objects, allowing for easy regeneration of the final 3D scene.

28.05.25 · Project Page · Code · Text-to-3D · 3D Editing · 3D Scene Generation

RenderFormer

RenderFormer can render images from triangle mesh representations with full global illumination effects.

28.05.25 · Project Page · Code · 3D Scene Generation

DualParal

DualParal can generate minute-long videos.

28.05.25 · Project Page · Code · Text-to-Video

OmniPainter

OmniPainter can generate high-quality images that match a prompt and a style reference image in just 4 to 6 timesteps. It uses the self-consistency property of latent consistency models to ensure the results closely align with the style of the reference image.

27.05.25 · Project Page · Code · Text-to-Image · Image Style Transfer

LT3SD

LT3SD can generate large-scale 3D scenes using a method that captures both basic shapes and fine details. It allows for flexible output sizes and produces high-quality scenes, even completing missing parts of a scene.

26.05.25 · Project Page · Code · 3D Scene Generation

ReStyle3D

ReStyle3D can transfer the look of a style image to real-world scenes from different angles. It keeps the structure and details intact, making it great for interior design and virtual staging.

26.05.25 · Project Page · Code · 3D Style Transfer · 3D Editing · 3D Scene Generation

BAGEL

BAGEL is a unified multimodal model that can understand and generate images and text, excelling in tasks like image editing and predicting future frames. Basically the open-source version of GPT-4o.

23.05.25 · Project Page · Code · Image Editing

Uni3C

Uni3C is a video generation method that adds support for both camera controls and human motion in video generation.

21.05.25 · Project Page · Code · Controllable Video Generation

4K4DGen

4K4DGen can turn a single panorama image into an immersive 4D environment with 360-degree views at 4K resolution. The method is able to animate the scene and optimize a set of 4D Gaussians using efficient splatting techniques for real-time exploration.

20.05.25 · Project Page · Code · Image-to-3D · Video-to-4D

PixelHacker

PixelHacker can perform image inpainting with strong consistency in structure and meaning. It uses a diffusion-based model and a dataset of 14 million image-mask pairs, achieving better results than other methods in texture, shape, and color consistency.

20.05.25 · Project Page · Code · Image Inpainting

MVPainter

MVPainter can generate high-quality 3D textures by aligning reference textures with geometry.

20.05.25 · Project Page · Code · Image-to-Texture · 3D Texture Generation

MoCha

MoCha can generate talking character animations from speech and text, allowing for multi-character conversations with turn-based dialogue.

19.05.25 · Project Page · Code · Text-to-Video

RealisDance-DiT

RealisDance-DiT can generate high-quality character animations from images and pose sequences. It effectively handles challenges like character-object interactions and complex gestures while using minimal changes to the Wan-2.1 video model and is part of the Uni3C method.

19.05.25 · Project Page · Code · Text-to-Video

RealCam-I2V

RealCam-I2V can generate high-quality videos from real-world images with consistent parameter camera controls.

18.05.25 · Project Page · Code · Image-to-Video · Controllable Video Generation

HunyuanPortrait

HunyuanPortrait can animate characters from a single portrait image by using facial expressions and head poses from video clips. It achieves lifelike animations with high consistency and control, effectively separating appearance and motion.

18.05.25 · Project Page · Code · Image-to-Video · Talking Head Generation

Custom SVG

Custom SVG can generate high-quality SVGs from text prompts with customizable styles.

16.05.25 · Project Page · Code · Text-to-Image

ObjectCarver

ObjectCarver can segment, reconstruct, and separate 3D objects from a single view using just user-input clicks, eliminating the need for segmentation masks.

14.05.25 · Project Page · Code · 3D Object Generation · 3D Segmentation · 3D Editing

Marigold

Marigold can estimate depth, predict surface normals, and decompose images with minimal changes.

14.05.25 · Project Page · Code · Image-to-Image · Image Relighting · Image Editing

MTVCrafter

MTVCrafter can generate high-quality human image animations from 3D motion sequences.

14.05.25 · Project Page · Code · Image-to-Video

Progressive Autoregressive Video Diffusion Models

PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.

12.05.25 · Project Page · Code · Text-to-Video