AI Toolbox
A curated collection of 849 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.





Any-to-Bokeh can turn videos into bokeh effects that show depth and focus.
MeshArt can generate 3D meshes with clean shapes.
Disco4D can generate and animate 4D human models from a single image by separating clothing from the body. It uses diffusion models for detailed 3D representations and can model parts that are not visible in the input image.
GCC can inpaint color checkers into images to improve lighting and color accuracy.
RepText can render multilingual visual text in user-chosen fonts without needing to understand the text. It allows for customization of text content, font, and position.
ContentV can generate high-quality videos from text prompts in various resolutions and lengths.
MARBLE can blend and change the material properties of objects in images using material embeddings in CLIP-space. It allows control over attributes like roughness, metallic, transparency, and glow, enabling multiple edits at once and supporting various artistic styles.
Synergizing Motion and Appearance can generate high-quality talking head videos by combining facial identity from a source image with motion from a driving video.
After NeRFs and Gaussian Splatting we got Triangle Splatting. A new method that can render real-time radiance fields at over 2,400 FPS with a 1280x720 resolution. It combines triangle representations with differentiable rendering for better visual quality and faster results than Gaussian splatting methods.
UniTEX can generate high-quality textures for 3D assets without using UV mapping. It maps 3D points to texture values based on surface proximity and uses a transformer-based model for better texture quality.
Generative Omnimatte can break down videos into meaningful layers, isolating objects, shadows, and reflections without needing static backgrounds. It uses a video diffusion model for high-quality results and can fill in hidden areas, enhancing video editing options.
Direct3D-S2 can generate high-resolution 3D shapes.
MiniMax-Remover can remove objects from videos efficiently with just 6 sampling steps.
EPiC can control video cameras in image-to-video and video-to-video tasks without needing many camera path details.
SceneFactor generates 3D scenes from text using an intermediate 3D semantic map. This map can be edited to add, remove, resize, and replace objects, allowing for easy regeneration of the final 3D scene.
RenderFormer can render images from triangle mesh representations with full global illumination effects.
OmniPainter can generate high-quality images that match a prompt and a style reference image in just 4 to 6 timesteps. It uses the self-consistency property of latent consistency models to ensure the results closely align with the style of the reference image.
LT3SD can generate large-scale 3D scenes using a method that captures both basic shapes and fine details. It allows for flexible output sizes and produces high-quality scenes, even completing missing parts of a scene.
ReStyle3D can transfer the look of a style image to real-world scenes from different angles. It keeps the structure and details intact, making it great for interior design and virtual staging.