AI Toolbox
A curated collection of 954 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
MemoryTalker can generate realistic 3D facial animations from audio alone, without needing speaker ID or 3D facial meshes.
WeatherEdit can generate realistic weather effects in 3D scenes with control over type and severity. It uses a dynamic 4D Gaussian field for weather particles and ensures consistency across images, making it ideal for simulations like autonomous driving in bad weather.
Lite2Relight can relight human portraits while preserving 3D consistency and identity.
BLADE can recover 3D human meshes from a single image by estimating perspective projection parameters.
VertexRegen can generate 3D meshes with different levels of detail by reversing the edge collapse process through vertex splitting.
Edicho can edit images consistently, even with different poses and lighting. It uses a training-free method based on diffusion models and works well with other tools like ControlNet and BrushNet.
GENMO can generate and estimate human motion from text, audio, video, and 3D keyframes. It allows for flexible control of motion outputs.
STAR can generate audio from speech input while capturing important sounds and scene details.
OmniPaint can remove backgrounds and insert objects seamlessly by treating these tasks as connected processes.
ORIGEN can generate images with accurate 3D orientations for multiple objects based on text prompts.
InterActHuman can generate videos with multiple human characters by matching audio to each person.
Assembler can reconstruct complete 3D objects from part meshes and a reference image.
FLUX-IR can restore low-quality images to high-quality ones by optimizing paths through reinforcement learning.
MTV can create high-quality videos that match audio by separating it into speech, effects, and music tracks.
MeshPad can create and edit 3D meshes from 2D sketches. Users can easily add or delete mesh parts through simple sketch changes.
StyleSculptor can generate 3D assets from a content image and style images without needing extra training.
VividFace can swap faces in videos while keeping the original person’s look and expressions. It handles challenges like keeping the face consistent over time and working well with different angles and lighting.
Mask²DiT can generate long videos with multiple scenes by aligning video segments with text descriptions.
PINO can generate realistic interactions among groups of any size by breaking down complex actions into simple pairwise motions. It uses pretrained diffusion models for two-person interactions and ensures realistic movement with physics-based rules, allowing control over character speed and position.
Motion-2-to-3 can generate realistic 3D human motions from text prompts using 2D motion data from videos. It improves motion diversity and efficiency by predicting consistent joint movements and root dynamics with a multi-view diffusion model.