AI Toolbox

A curated collection of 942 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

InvSR

InvSR can upscale images in one to five steps. It achieves great results even with just one step, making it efficient for improving images in real-world situations.

13.12.24 · Code · Demo · Image Upscaling · Image Restoration

DisPose

DisPose can generate high-quality human image animations from sparse skeleton pose guidance.

13.12.24 · Project Page · Code · Image-to-Video

Personalized Restoration via Dual-Pivot Tuning

Personalized Restoration is a method that can restore degraded images of faces while retaining the identity of the person using reference images. The method is able to edit the restored image using text prompts, enabling modifications like changing the color of the eyes or making the person smile.

12.12.24 · Project Page · Code · Image Restoration

Leffa

Leffa can generate person images based on reference images, allowing for precise control over appearance and pose.

12.12.24 · Project Page · Code · Demo · Image Editing · Personalized Image Generation · Virtual Image Try-On

TryOffAnyone

TryOffAnyone can generate high-quality images of clothing on models from photos.

12.12.24 · Code · Image-to-Image

SynCamMaster

SynCamMaster can generate videos from different viewpoints while keeping the look and shape consistent. It improves text-to-video models for multi-camera use and allows re-rendering from new angles.

11.12.24 · Project Page · Code · Text-to-Video

ObjCtrl-2.5D

ObjCtrl-2.5D enables object control in image-to-video generation using 3D trajectories from 2D inputs with depth information.

11.12.24 · Project Page · Code · Image-to-Video · Controllable Video Generation

PRM

PRM can create high-quality 3D meshes from a single image using photometric stereo techniques. It improves detail and handles changes in lighting and materials, allowing for features like relighting and material editing.

11.12.24 · Project Page · Code · 3D Mesh Generation · 3D Texture Generation · 3D Relighting

3DTrajMaster

3DTrajMaster can control the 3D motions of multiple objects in videos using user-defined 6DoF pose sequences.

10.12.24 · Project Page · Code · Controllable Video Generation

FireFlow

FireFlow is FLUX-dev editing method that can perform fast image inversion and semantic editing with just 8 diffusion steps.

10.12.24 · Code · Image Editing

Tactile DreamFusion

Tactile DreamFusion can improve 3D asset generation by combining high-resolution tactile sensing with diffusion-based image priors. Supports both text-to-3D and image-to-3D generation.

10.12.24 · Project Page · Code · Text-to-3D · Image-to-3D

Factor Graph Diffusion

Factor Graph Diffusion can generate high-quality images with better prompt adherence. The method allows for controllable image creation using tools like segmentation and depth maps.

09.12.24 · Project Page · Code · Image Editing · Controllable Image Generation

Customizing Motion in Text-to-Video Diffusion Models

On the other hand, Customizing Motion can learn and generalize input motion patterns from input videos and apply them to new and unseen contexts.

09.12.24 · Project Page · Code · Text-to-Video

MEMO

MEMO can generate talking videos from images and audio. It keeps the person’s identity consistent and matches lip movements to the audio, producing natural expressions.

06.12.24 · Project Page · Code · Audio-to-Video · Talking Head Generation

MV-Adapter

MV-Adapter can generate images from multiple views while keeping them consistent across views. It enhances text-to-image models like Stable Diffusion XL, supporting both text and image inputs, and achieves high-resolution outputs at 768x768.

06.12.24 · Project Page · Code · Text-to-Image · Image-to-Image

Context-Aware Video Instance Segmentation

CAVIS can do instance segmentation on videos. It’s able to better track objects and improve instance matching accuracy, resulting in more accurate and stable instance segmentation.

05.12.24 · Project Page · Code · Video Object Detection · Video Object Tracking

VideoRepair

VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.

05.12.24 · Project Page · Code · Text-to-Video

Trellis 3D

Trellis 3D generates high-quality 3D assets in formats like Radiance Fields, 3D Gaussians, and meshes. It supports text and image conditioning, offering flexible output format selection and local 3D editing capabilities.

04.12.24 · Project Page · Code · Demo · 3D Object Generation · Text-to-3D · Image-to-3D · 3D Editing

Anagram-MTL

Anagram-MTL can generate visual anagrams that change appearance with transformations like flipping or rotating.

04.12.24 · Code · Text-to-Image

Dessie

Dessie can estimate the 3D shape and pose of horses from single images. It also works with other large animals like zebras and cows.

03.12.24 · Project Page · Code · 3D Object Generation · 3D Object Detection