AI Toolbox
A curated collection of 965 free cutting edge AI papers with code and tools for text, image, video, 3D and audio generation and manipulation.
Spacetime Gaussian Feature Splatting is a novel dynamic scene representation that is able to capture static, dynamic, as well as transient content within a scene and can render them at 8K resolution and 60 FPS on an RTX 4090.
PIA is a method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.
RelightableAvatar is another method that can create relightable and animatable neural avatars from monocular video.
Intrinsic Image Diffusion can generate detailed albedo, roughness, and metallic maps from a single indoor scene image.
HAAR can generate realistic 3D hairstyles from text prompts. It uses 3D hair strands to create detailed hair structures and allows for physics-based rendering and simulation.
Paint-it can generate high-fidelity physically-based rendering (PBR) texture maps for 3D meshes from a text description. The method is able to relight the mesh by changing High-Dynamic Range (HDR) environmental lighting and control the material properties at test-time.
VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.
DreamTalk is able to generate talking heads conditioned on a given text prompt. The model is able to generate talking heads in multiple languages and can also manipulate the speaking style of the generated video.
DiffusionLight can estimate the lighting in a single input image and convert it into an HDR environment map. The technique is able to generate multiple chrome balls with varying exposures for HDR merging and can be used to seamlessly insert 3D objects into an existing photograph. Pretty cool.
Wan-Animate can animate characters from images by copying their expressions and movements from a video. It also allows for seamless character replacement in videos, keeping the original lighting and color tone for a consistent look.
FreeInit can improve the quality of videos made by diffusion models without extra training. It fixes issues between training and use, making videos look better and more consistent.
MinD-3D can reconstruct high-quality 3D objects from fMRI brain signals. It uses a three-stage framework to decode 3D visual information, showing strong connections between the brain’s processing and the created objects.
ControlNet-XS can control text-to-image diffusion models like Stable Diffusion and Stable Diffusion-XL with only 1% of the parameters of the base model. It is about twice as fast as ControlNet and produces higher quality images with better control.
ASH can render photorealistic and animatable 3D human avatars in real time.
LayerPeeler can remove hidden layers from images and create vector graphics with clear paths and organized layers.
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation can generate realistic and stable videos by separating spatial and temporal factors. It improves video quality by extracting motion and appearance cues, allowing for flexible content variations and better understanding of scenes.
PhotoMaker can generate realistic human photos from input images and text prompts. It can change attributes of people, like changing hair colour and adding glasses, turn people from artworks like Van Gogh’s self-portrait into realistic photos, or mix identities of multiple people.
Doodle Your 3D can turn abstract sketches into precise 3D shapes. The method can even edit shapes by simply editing the sketch. Super cool. Sketch-to-3D-print isn’t that far away now.
WonderJourney lets you wander through your favourite paintings, peoms and haikus. The method can generate a sequence of diverse yet coherently connected 3D scenes from a single image or text prompt.
Relightable Gaussian Codec Avatars can generate high-quality, relightable 3D head avatars that show fine details like hair strands and pores. They work well in real-time under different lighting conditions and are optimized for consumer VR headsets.