AI Toolbox

A curated collection of 948 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

Go-with-the-Flow

Go-with-the-Flow can control motion patterns in video diffusion models using real-time warped noise from optical flow fields. It allows users to manipulate object movements and camera motions while keeping high image quality and not needing changes to existing models.

14.01.25 · Project Page · Code · Controllable Video Generation · Image-to-Video

Chat2SVG

Chat2SVG can generate and edit SVG vector graphics from text prompts. It combines Large Language Models and image diffusion models to create detailed SVG templates and allows users to refine them with simple language instructions.

13.01.25 · Project Page · Code · Text-to-Image · Image Editing

GaussianDreamerPro

GaussianDreamerPro can generate 3D Gaussian assets from text that can be seamlessly integrated into downstream manipulation pipelines, such as animation, composition, and simulation.

12.01.25 · Project Page · Code · Text-to-3D · 3D Object Generation · 3D Editing

Kinetic Typography Diffusion Model

Kinetic Typography Diffusion Model can generate kinetic typography videos with legible and artistic letter motions based on text prompts.

11.01.25 · Project Page · Code · Text-to-Video

UniVG

UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.

10.01.25 · Project Page · Code · Text-to-Video · Image-to-Video

TryOffDiff

TryOffDiff can generate high-quality images of clothing from photos of people wearing them.

08.01.25 · Project Page · Code · Image-to-Image

LLM4GEN

LLM4GEN enhances the semantic understanding ability of text-to-image diffusion models by leveraging the semantic representation of LLMs. Meaning: More complex and dense prompts that involve multiple objects, attribute binding, and long descriptions.

07.01.25 · Project Page · Code · Text-to-Image

TransPixar

TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.

07.01.25 · Project Page · Code · Demo · Text-to-Video

Coin3D

Coin3D can generate and edit 3D assets from a basic input shape. Similar to ControlNet, this enables precise part editing and responsive 3D object previewing within a few seconds.

06.01.25 · Project Page · Code · 3D Object Generation · 3D Editing

Reflecting Reality

Reflecting Reality can generate realistic mirror reflections using a method called MirrorFusion. It allows users to control mirror placement and achieves better reflection quality and geometry than other methods.

05.01.25 · Project Page · Code · Image Inpainting

Digital Salon

Digital Salon can generate detailed 3D hairstyles from text descriptions. It supports up to 80,000 hair strands and allows for real-time simulation and interactive grooming.

03.01.25 · Project Page · Code · 3D Object Generation · 3D Editing · 3D Scene Generation

SVFR

SVFR can restore high-quality video faces from low-quality inputs. It combines video face restoration, inpainting, and colorization to improve the overall quality and coherence of the restored videos.

02.01.25 · Project Page · Code · Video Restoration · Video Inpainting · Video Colorization

FabricDiffusion

FabricDiffusion can transfer high-quality fabric textures from a 2D clothing image to 3D garments of any shape.

02.01.25 · Project Page · Code · Image-to-3D · 3D Virtual Try-On · 3D Texture Generation

TangoFlux

TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.

31.12.24 · Project Page · Code · Demo · Text-to-Audio

DAS3R

DAS3R can decompose scenes and rebuild static backgrounds from videos.

30.12.24 · Project Page · Code · 3D Scene Generation

REACTO

REACTO can reconstruct articulated 3D objects by capturing the motion and shape of objects with flexible deformation from a single video.

27.12.24 · Project Page · Code · 3D Object Generation

DiTCtrl

DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.

24.12.24 · Project Page · Code · Text-to-Video

MMAudio

MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.

23.12.24 · Project Page · Code · Video-to-Audio · Text-to-Audio

Learning Source Disentanglement in Neural Audio Codec

SD-Codec can separate and reconstruct audio signals from speech, music, and sound effects using different codebooks for each type. This method improves how we understand audio codecs and gives better control over audio generation while keeping high quality.

20.12.24 · Project Page · Code · Audio Separation

AniDoc

AniDoc can automate the colorization of line art in videos and create smooth animations from simple sketches.

19.12.24 · Project Page · Code · Video Colorization · Image Colorization