AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Sonic4D can generate spatial audio for 4D scenes by tracking sound sources from monocular video.
StyleCity can stylize a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generate a harmonic omnidirectional sky background.
PanoDreamer can generate 360° 3D scenes from a single image by creating a panoramic image and estimating its depth. It effectively fills in missing parts and projects them into 3D space.
CompleteMe can complete human images while keeping important details like clothing patterns and accessories from reference images. It uses a dual U-Net architecture with a Region-focused Attention Block to improve visual quality.
FairyGen can turn a child’s drawing into a story-driven cartoon video while keeping its unique style.
Chord can generate high-quality PBR materials from texture images. It uses a fine-tuned SDXL for texture creation and allows users to edit materials flexibly, performing well on both generated and real-world images.
DynVFX can add dynamic content to real-world videos based on user text instructions. It automatically creates new objects and effects that interact with the original footage, considering camera motion and occlusions.
OmniVCus can customize videos by allowing changes based on depth, masks, and text prompts.
CoCoIns can generate the same subjects in different images without needing reference pictures or extra adjustments. It uses a contrastive learning method to link specific codes with subjects, making it easy for users to reuse these codes for consistent results.
BecomingLit can create high-resolution head avatars that can be relit and animated from single videos.
MemoryTalker can generate realistic 3D facial animations from audio alone, without needing speaker ID or 3D facial meshes.
WeatherEdit can generate realistic weather effects in 3D scenes with control over type and severity. It uses a dynamic 4D Gaussian field for weather particles and ensures consistency across images, making it ideal for simulations like autonomous driving in bad weather.
Lite2Relight can relight human portraits while preserving 3D consistency and identity.
BLADE can recover 3D human meshes from a single image by estimating perspective projection parameters.
VertexRegen can generate 3D meshes with different levels of detail by reversing the edge collapse process through vertex splitting.
Edicho can edit images consistently, even with different poses and lighting. It uses a training-free method based on diffusion models and works well with other tools like ControlNet and BrushNet.
GENMO can generate and estimate human motion from text, audio, video, and 3D keyframes. It allows for flexible control of motion outputs.
STAR can generate audio from speech input while capturing important sounds and scene details.
OmniPaint can remove backgrounds and insert objects seamlessly by treating these tasks as connected processes.
ORIGEN can generate images with accurate 3D orientations for multiple objects based on text prompts.