AI Toolbox

A curated collection of 948 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.

AI Tools

3D Audio Image Image </assigned_modality> <assigned_tasks> Image Upscaling </assigned_tasks> <assigned_modality> Video Text Video

LeX-Art

LeX-Art can generate high-quality text-image pairs with better text rendering and design. It uses a prompt enrichment model called LeX-Enhancer and two optimized models, LeX-FLUX and LeX-Lumina, to improve color, position, and font accuracy.

27.03.25 · Project Page · Code · Text-to-Image

TexGaussian

TexGaussian can generate high-quality PBR materials for 3D meshes in one step. It produces albedo, roughness, and metallic maps quickly and with great visual quality, ensuring better consistency with the input geometry.

26.03.25 · Project Page · Code · 3D Texture Generation

AccVideo

AccVideo can speed up video diffusion models by reducing the number of steps needed for video creation. It achieves an 8.5x faster generation speed compared to HunyuanVideo, producing high-quality videos at 720x1280 resolution and 24fps, which makes text-to-video generation way more efficient.

26.03.25 · Project Page · Code · Text-to-Video

CausVid

CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.

26.03.25 · Project Page · Code · Video-to-Video · Image-to-Video

InterMimic

InterMimic can learn complex human-object interactions from imperfect motion capture data. It enables realistic simulations of full-body interactions with dynamic objects and works well with kinematic generators for better modeling.

25.03.25 · Project Page · Code · Motion Generation

FloVD

FloVD can generate camera-controllable videos using optical flow maps to show motion.

24.03.25 · Project Page · Code · Video Generation

MotionMatcher

MotionMatcher can customize text-to-video diffusion models using a reference video to transfer motion and camera framing to different scenes.

24.03.25 · Project Page · Code · Text-to-Video

DIDiffGes

DIDiffGes can generate high-quality gestures from speech in just 10 sampling steps.

24.03.25 · Project Page · Code · Audio-to-3D

LayerAnimate

LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.

22.03.25 · Project Page · Code · Video Interpolation · Controllable Video Generation · Image-to-Video

StyleMaster

StyleMaster can stylize videos by transferring artistic styles from images while keeping the original content clear.

21.03.25 · Project Page · Code · Video Style Transfer

PP-VCtrl

PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.

21.03.25 · Project Page · Code · Text-to-Video · Image-to-Video · Controllable Video Generation

SISO

SISO can generate and edit images using just one subject image without any training. It improves image quality, keeps the subject clear, and preserves the background better than other methods.

21.03.25 · Project Page · Code · Image Editing · Personalized Image Generation

MagicMotion

MagicMotion can animate objects in videos by controlling their paths with masks, bounding boxes, and sparse boxes.

21.03.25 · Project Page · Code · Controllable Video Generation

DeepMesh

DeepMesh can generate high-quality 3D meshes from point clouds and images.

21.03.25 · Project Page · Code · 3D Mesh Generation

ARTalk

ARTalk can generate realistic 3D head motions, including lip synchronization, blinking, and facial expressions, from audio in real-time.

21.03.25 · Project Page · Code · Audio-to-Motion

InfiniteYou

InfiniteYou can generate high-quality images with FLUX and retain a person’s identity.

21.03.25 · Project Page · Code · Personalized Image Generation · Image Editing

StarVector

StarVector can generate scalable vector graphics (SVG) code from pixel images.

20.03.25 · Project Page · Code · Image-to-Text

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Diptych Prompting can generate images of new subjects in specific contexts by treating text-to-image generation as an inpainting task.

20.03.25 · Project Page · Code · Text-to-Image · Image Editing

MotionStreamer

MotionStreamer can generate human motions based on text prompts and supports motion composition and longer motion generation. Also has a Blender plugin.

20.03.25 · Project Page · Code · Text-to-Motion · 3D Motion Generation

Thera

Thera can upscale images to super-resolution using with neural heat fields that model a precise point spread function. This method allows for correct anti-aliasing at any output size.

19.03.25 · Project Page · Code · Image Upscaling