AI Toolbox
A curated collection of 610 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
SD-Codec can separate and reconstruct audio signals from speech, music, and sound effects using different codebooks for each type. This method improves how we understand audio codecs and gives better control over audio generation while keeping high quality.
AniDoc can automate the colorization of line art in videos and create smooth animations from simple sketches.
FitDiT can generate realistic virtual try-on images that show how clothes fit on different body types. It keeps garment textures clear and works quickly, taking only 4.57 seconds for a single image.
ColorFlow can colorize black and white line-art and manga panels while keeping characters and objects consistent.
FCVG can create smooth video transitions between two key frames. It improves stability by defining clear paths for movement and matching lines from the input frames, ensuring coherent changes even with fast motion.
CustomCrafter can generate high-quality videos from text prompts and reference images. It improves motion generation with a Dynamic Weighted Video Sampling Strategy and allows for better concept combinations without needing extra video or fine-tuning.
TEXGen can generate high-resolution UV texture maps in texture space using a 700 million parameter diffusion model. It supports text-guided texture inpainting and sparse-view texture completion, making it versatile for creating textures for 3D assets.
YouDream can generate high-quality 3D animals from a single image and a text prompt. The method is able to preserve anatomic consistency and is capable of generating and combining commonly found animals.
InvSR can upscale images in one to five steps. It achieves great results even with just one step, making it efficient for improving images in real-world situations.
DisPose can generate high-quality human image animations from sparse skeleton pose guidance.
Personalized Restoration is a method that can restore degraded images of faces while retaining the identity of the person using reference images. The method is able to edit the restored image using text prompts, enabling modifications like changing the color of the eyes or making the person smile.
Leffa can generate person images based on reference images, allowing for precise control over appearance and pose.
TryOffAnyone can generate high-quality images of clothing on models from photos.
SynCamMaster can generate videos from different viewpoints while keeping the look and shape consistent. It improves text-to-video models for multi-camera use and allows re-rendering from new angles.
ObjCtrl-2.5D enables object control in image-to-video generation using 3D trajectories from 2D inputs with depth information.
PRM can create high-quality 3D meshes from a single image using photometric stereo techniques. It improves detail and handles changes in lighting and materials, allowing for features like relighting and material editing.
3DTrajMaster can control the 3D motions of multiple objects in videos using user-defined 6DoF pose sequences.
FireFlow is FLUX-dev editing method that can perform fast image inversion and semantic editing with just 8 diffusion steps.
Tactile DreamFusion can improve 3D asset generation by combining high-resolution tactile sensing with diffusion-based image priors. Supports both text-to-3D and image-to-3D generation.
Factor Graph Diffusion can generate high-quality images with better prompt adherence. The method allows for controllable image creation using tools like segmentation and depth maps.