AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Animate-X++ can animate characters from a single image and a pose sequence while creating dynamic backgrounds.
HuMo can generate high-quality human-centric videos from text, images, and audio. It ensures that the subjects are preserved and the audio matches the visuals, using advanced training methods for better control.
Diffuman4D can generate high-quality, 4D-consistent videos of human performances from just a few input videos. It uses a spatio-temporal diffusion model to improve the quality of the videos, making them more realistic and consistent than other methods.
InstantRestore can restore badly damaged face images in near real-time. It uses a single-step image diffusion model and a small set of reference images to keep the person’s identity.
ByteDance published a new low-step method called PeRFlow which accelerates diffusion models like Stable Diffusion to generate images faster. PeRFlow is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines such as ControlNet, Wonder3D and more.
3DHM can animate people with 3D camera control from a single image and a given target video motion sequence.
MaPa can generate high-quality materials for 3D meshes! It can create segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides significant flexibility in editing.
SemLayoutDiff can generate diverse 3D indoor scenes by creating detailed semantic maps and placing furniture while considering doors and windows.
3DV-TON can generate high-quality videos for trying on clothes using 3D models. It handles complex clothing patterns and different body poses well, and it has a strong masking method to reduce errors.
CanonSwap can transfer identities from images to videos while keeping natural movements like head poses and facial expressions.
Hunyuan-GameCraft can generate interactive game videos by combining keyboard and mouse inputs into a shared camera view.
Vivid-VR can restore and enhance videos using a text-to-video diffusion transformer. It achieves realistic textures and smooth motion while preserving content and giving users control over the video generation process.
Lumen can replace video backgrounds while adjusting the lighting of the foreground for a consistent look.
OmniTry can let users try on jewelry and accessories without needing a mask.
LongSplat can create high-quality 3D scenes from long videos without needing camera positions.
MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.
HOIDiNi can generate realistic human-object interactions with accurate hand contact and natural body movement from text prompts.
SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.
AnimateAnyMesh can animate 3D meshes based on text prompts.
PERSONA can create personalized 3D avatars from a single image, allowing for realistic animations that reflect the subject’s identity.