AI Toolbox
A curated collection of 849 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.
Coin3D can generate and edit 3D assets from a basic input shape. Similar to ControlNet, this enables precise part editing and responsive 3D object previewing within a few seconds.
Reflecting Reality can generate realistic mirror reflections using a method called MirrorFusion. It allows users to control mirror placement and achieves better reflection quality and geometry than other methods.
SVFR can restore high-quality video faces from low-quality inputs. It combines video face restoration, inpainting, and colorization to improve the overall quality and coherence of the restored videos.
FabricDiffusion can transfer high-quality fabric textures from a 2D clothing image to 3D garments of any shape.
TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.
DAS3R can decompose scenes and rebuild static backgrounds from videos.
REACTO can reconstruct articulated 3D objects by capturing the motion and shape of objects with flexible deformation from a single video.
DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.
MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.
SD-Codec can separate and reconstruct audio signals from speech, music, and sound effects using different codebooks for each type. This method improves how we understand audio codecs and gives better control over audio generation while keeping high quality.
AniDoc can automate the colorization of line art in videos and create smooth animations from simple sketches.
FitDiT can generate realistic virtual try-on images that show how clothes fit on different body types. It keeps garment textures clear and works quickly, taking only 4.57 seconds for a single image.
ColorFlow can colorize black and white line-art and manga panels while keeping characters and objects consistent.
FCVG can create smooth video transitions between two key frames. It improves stability by defining clear paths for movement and matching lines from the input frames, ensuring coherent changes even with fast motion.
CustomCrafter can generate high-quality videos from text prompts and reference images. It improves motion generation with a Dynamic Weighted Video Sampling Strategy and allows for better concept combinations without needing extra video or fine-tuning.
TEXGen can generate high-resolution UV texture maps in texture space using a 700 million parameter diffusion model. It supports text-guided texture inpainting and sparse-view texture completion, making it versatile for creating textures for 3D assets.
YouDream can generate high-quality 3D animals from a single image and a text prompt. The method is able to preserve anatomic consistency and is capable of generating and combining commonly found animals.
InvSR can upscale images in one to five steps. It achieves great results even with just one step, making it efficient for improving images in real-world situations.
DisPose can generate high-quality human image animations from sparse skeleton pose guidance.