AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
BAGEL is a unified multimodal model that can understand and generate images and text, excelling in tasks like image editing and predicting future frames. Basically the open-source version of GPT-4o.
Uni3C is a video generation method that adds support for both camera controls and human motion in video generation.
4K4DGen can turn a single panorama image into an immersive 4D environment with 360-degree views at 4K resolution. The method is able to animate the scene and optimize a set of 4D Gaussians using efficient splatting techniques for real-time exploration.
PixelHacker can perform image inpainting with strong consistency in structure and meaning. It uses a diffusion-based model and a dataset of 14 million image-mask pairs, achieving better results than other methods in texture, shape, and color consistency.
MVPainter can generate high-quality 3D textures by aligning reference textures with geometry.
MoCha can generate talking character animations from speech and text, allowing for multi-character conversations with turn-based dialogue.
RealisDance-DiT can generate high-quality character animations from images and pose sequences. It effectively handles challenges like character-object interactions and complex gestures while using minimal changes to the Wan-2.1 video model and is part of the Uni3C method.
RealCam-I2V can generate high-quality videos from real-world images with consistent parameter camera controls.
HunyuanPortrait can animate characters from a single portrait image by using facial expressions and head poses from video clips. It achieves lifelike animations with high consistency and control, effectively separating appearance and motion.
Custom SVG can generate high-quality SVGs from text prompts with customizable styles.
ObjectCarver can segment, reconstruct, and separate 3D objects from a single view using just user-input clicks, eliminating the need for segmentation masks.
Marigold can estimate depth, predict surface normals, and decompose images with minimal changes.
MTVCrafter can generate high-quality human image animations from 3D motion sequences.
PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.
Skyeyes can generate photorealistic sequences of ground view images from aerial view inputs. It ensures that the images are consistent and realistic, even when there are large gaps in views.
LegoGPT can generate stable and buildable LEGO designs from text prompts. It uses physics-aware techniques to ensure designs are safe for manual assembly and robotic construction, and it can create colored and textured models.
SVAD can generate high-quality 3D avatars from a single image. It keeps the person’s identity and details consistent across different poses and angles while allowing for real-time rendering.
PrimitiveAnything can generate high-quality 3D shapes from 3D models, text and images by breaking down complex forms into simple geometric parts. It uses a shape-conditioned primitive transformer to ensure that the shapes remain accurate and diverse.
PreciseCam can generate images with exact control over camera angles and lens distortions using four simple camera settings.
HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.