AI Toolbox
A curated collection of 494 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Animate3D can animate any static multi-view 3D model.
VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.
And because methods always come in pairs, GenN2N is another NeRF editing method. This one can edit scenes using text prompts, colorize, upscale and inpaint them.
UniMuMo can generate outputs across text, music, and motion. It achieves this by aligning unpaired music and motion data based on rhythmic patterns.
OmniBooth can generate images with precise control over their layout and style. It allows users to customize images using masks and text or image guidance, making the process flexible and personal.
EgoAllo can estimate 3D human body pose, height, and hand parameters using images from a head-mounted device.
While TripoSR can generate meshes from an image, MagicClay can edit them. It’s an artist-friendly tool that allows you to sculpt regions of a mesh with text prompts while keeping other regions untouched.
TCAN can animate characters of various styles from a pose guidance video.
Generative Radiance Field Relighting can relight 3D scenes captured under a single light source. It allows for realistic control over light direction and improves the consistency of views, making it suitable for complex scenes with multiple objects.
Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.
Love this one! SVGCustomization is a novel pipeline that is able to edit existing vector images with text prompts while preserving the properties and layer information vector images are made of.
SynTalker can generate realistic full-body motions that match speech and text prompts. It allows precise control of movements, like talking while walking.
DreamCatalyst can edit NeRF scenes in only about 25 minutes or produce high-quality results in less than 70 minutes.
GScream is yet another method for object removal in 3D scenes. This one uses Gaussian Splatting to update the radiance field and is able to preserve geometric consistency and texture coherence.
MotionMaster can extract camera motions from a single source video or multiple videos and apply them to new videos. This enables the model to control camera motions in a more flexible and controllable way, resulting in videos with variable-speed zoom, pan left, pan right, dolly zoom in, dolly zoom out and more.
LVCD can colorize lineart videos using a pretrained video diffusion model. It ensures smooth motion and high video quality by effectively transferring colors from reference frames.
Gaussian-Informed Continuum for Physical Property Identification and Simulation can recover 3D objects from Gaussian point sets and simulate their physical properties.
StructLDM can generate animatable compositional humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc. AI girlfriends/boyfriends are definitely gonna be a thing.
RodinHD can generate high-fidelity 3D avatars from a portrait image. The method is able to capture intricate details such as hairstyles and can generalize to in-the-wild portrait input.
TeFF is a similar method to SphereHead, this one supports more than just human faces and can reconstruct a 3D object from the 360 view of a single image.