AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.
TryOffDiff can generate high-quality images of clothing from photos of people wearing them.
LLM4GEN enhances the semantic understanding ability of text-to-image diffusion models by leveraging the semantic representation of LLMs. Meaning: More complex and dense prompts that involve multiple objects, attribute binding, and long descriptions.
TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.
Coin3D can generate and edit 3D assets from a basic input shape. Similar to ControlNet, this enables precise part editing and responsive 3D object previewing within a few seconds.
Reflecting Reality can generate realistic mirror reflections using a method called MirrorFusion. It allows users to control mirror placement and achieves better reflection quality and geometry than other methods.
Digital Salon can generate detailed 3D hairstyles from text descriptions. It supports up to 80,000 hair strands and allows for real-time simulation and interactive grooming.
SVFR can restore high-quality video faces from low-quality inputs. It combines video face restoration, inpainting, and colorization to improve the overall quality and coherence of the restored videos.
FabricDiffusion can transfer high-quality fabric textures from a 2D clothing image to 3D garments of any shape.
TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.
DAS3R can decompose scenes and rebuild static backgrounds from videos.
REACTO can reconstruct articulated 3D objects by capturing the motion and shape of objects with flexible deformation from a single video.
DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.
MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.
SD-Codec can separate and reconstruct audio signals from speech, music, and sound effects using different codebooks for each type. This method improves how we understand audio codecs and gives better control over audio generation while keeping high quality.
AniDoc can automate the colorization of line art in videos and create smooth animations from simple sketches.
FitDiT can generate realistic virtual try-on images that show how clothes fit on different body types. It keeps garment textures clear and works quickly, taking only 4.57 seconds for a single image.
ColorFlow can colorize black and white line-art and manga panels while keeping characters and objects consistent.
FCVG can create smooth video transitions between two key frames. It improves stability by defining clear paths for movement and matching lines from the input frames, ensuring coherent changes even with fast motion.
CustomCrafter can generate high-quality videos from text prompts and reference images. It improves motion generation with a Dynamic Weighted Video Sampling Strategy and allows for better concept combinations without needing extra video or fine-tuning.