AI Toolbox
A curated collection of 813 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





TexGaussian can generate high-quality PBR materials for 3D meshes in one step. It produces albedo, roughness, and metallic maps quickly and with great visual quality, ensuring better consistency with the input geometry.
AccVideo can speed up video diffusion models by reducing the number of steps needed for video creation. It achieves an 8.5x faster generation speed compared to HunyuanVideo, producing high-quality videos at 720x1280 resolution and 24fps, which makes text-to-video generation way more efficient.
CausVid can generate high-quality videos at 9.4 frames per second on a single GPU. It supports text-to-video, image-to-video, and dynamic prompting while reducing latency with a causal transformer architecture.
InterMimic can learn complex human-object interactions from imperfect motion capture data. It enables realistic simulations of full-body interactions with dynamic objects and works well with kinematic generators for better modeling.
FloVD can generate camera-controllable videos using optical flow maps to show motion.
MotionMatcher can customize text-to-video diffusion models using a reference video to transfer motion and camera framing to different scenes.
DIDiffGes can generate high-quality gestures from speech in just 10 sampling steps.
LayerAnimate can animate single anime frames from text prompts or interpolate between two frames with or without sketch-guidance. It allows users to adjust foreground and background elements separately.
StyleMaster can stylize videos by transferring artistic styles from images while keeping the original content clear.
PP-VCtrl can turn text-to-video models into customizable video generators. It uses control signals like Canny edges and segmentation masks to improve video quality and control without retraining the models, making it great for character animation and video editing.
SISO can generate and edit images using just one subject image without any training. It improves image quality, keeps the subject clear, and preserves the background better than other methods.
MagicMotion can animate objects in videos by controlling their paths with masks, bounding boxes, and sparse boxes.
DeepMesh can generate high-quality 3D meshes from point clouds and images.
ARTalk can generate realistic 3D head motions, including lip synchronization, blinking, and facial expressions, from audio in real-time.
InfiniteYou can generate high-quality images with FLUX and retain a person’s identity.
StarVector can generate scalable vector graphics (SVG) code from pixel images.
Diptych Prompting can generate images of new subjects in specific contexts by treating text-to-image generation as an inpainting task.
MotionStreamer can generate human motions based on text prompts and supports motion composition and longer motion generation. Also has a Blender plugin.
Thera can upscale images to super-resolution using with neural heat fields that model a precise point spread function. This method allows for correct anti-aliasing at any output size.
DreamRenderer extends FLUX with image content control using bounding boxes or masks.