AI Toolbox | AI Art Weekly

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Spacetime Gaussian Feature Splatting is a novel dynamic scene representation that is able to capture static, dynamic, as well as transient content within a scene and can render them at 8K resolution and 60 FPS on an RTX 4090.

28.12.23 · Project Page · Code · 3D Scene Generation

PIA

PIA is a method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.

21.12.23 · Project Page · Code · Text-to-Image · Image-to-Video

Relightable and Animatable Neural Avatars from Videos

RelightableAvatar is another method that can create relightable and animatable neural avatars from monocular video.

20.12.23 · Project Page · Code · Video-to-3D · 3D Avatar Generation

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion can generate detailed albedo, roughness, and metallic maps from a single indoor scene image.

19.12.23 · Project Page · Code · Image-to-Image

HAAR

HAAR can generate realistic 3D hairstyles from text prompts. It uses 3D hair strands to create detailed hair structures and allows for physics-based rendering and simulation.

18.12.23 · Project Page · Code · 3D Hair Generation

Paint-it

Paint-it can generate high-fidelity physically-based rendering (PBR) texture maps for 3D meshes from a text description. The method is able to relight the mesh by changing High-Dynamic Range (HDR) environmental lighting and control the material properties at test-time.

18.12.23 · Project Page · Code · Text-to-Texture · 3D Object Generation

VidToMe

VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.

17.12.23 · Project Page · Code · Video Editing

DreamTalk

DreamTalk is able to generate talking heads conditioned on a given text prompt. The model is able to generate talking heads in multiple languages and can also manipulate the speaking style of the generated video.

15.12.23 · Project Page · Code · 3D Object Generation

DiffusionLight

DiffusionLight can estimate the lighting in a single input image and convert it into an HDR environment map. The technique is able to generate multiple chrome balls with varying exposures for HDR merging and can be used to seamlessly insert 3D objects into an existing photograph. Pretty cool.

14.12.23 · Project Page · Code · Image Restoration

Wan-Animate

Wan-Animate can animate characters from images by copying their expressions and movements from a video. It also allows for seamless character replacement in videos, keeping the original lighting and color tone for a consistent look.

13.12.23 · Project Page · Code · Video Editing

FreeInit

FreeInit can improve the quality of videos made by diffusion models without extra training. It fixes issues between training and use, making videos look better and more consistent.

12.12.23 · Project Page · Code · Text-to-Video · Video Editing

MinD-3D

MinD-3D can reconstruct high-quality 3D objects from fMRI brain signals. It uses a three-stage framework to decode 3D visual information, showing strong connections between the brain’s processing and the created objects.

12.12.23 · Project Page · Code · Brain-to-3D · 3D Object Generation

ControlNet-XS

ControlNet-XS can control text-to-image diffusion models like Stable Diffusion and Stable Diffusion-XL with only 1% of the parameters of the base model. It is about twice as fast as ControlNet and produces higher quality images with better control.

11.12.23 · Project Page · Code · Text-to-Image · Image Editing

ASH

ASH can render photorealistic and animatable 3D human avatars in real time.

10.12.23 · Project Page · Code · 3D Avatar Generation

LayerPeeler

LayerPeeler can remove hidden layers from images and create vector graphics with clear paths and organized layers.

08.12.23 · Project Page · Code · Image Editing

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation can generate realistic and stable videos by separating spatial and temporal factors. It improves video quality by extracting motion and appearance cues, allowing for flexible content variations and better understanding of scenes.

07.12.23 · Project Page · Code · Text-to-Video

PhotoMaker

PhotoMaker can generate realistic human photos from input images and text prompts. It can change attributes of people, like changing hair colour and adding glasses, turn people from artworks like Van Gogh’s self-portrait into realistic photos, or mix identities of multiple people.

07.12.23 · Project Page · Code · Text-to-Image Personalized Image Generation

Doodle Your 3D

Doodle Your 3D can turn abstract sketches into precise 3D shapes. The method can even edit shapes by simply editing the sketch. Super cool. Sketch-to-3D-print isn’t that far away now.

07.12.23 · Project Page · Code · Image-to-3D · 3D Object Generation

WonderJourney

WonderJourney lets you wander through your favourite paintings, peoms and haikus. The method can generate a sequence of diverse yet coherently connected 3D scenes from a single image or text prompt.

06.12.23 · Project Page · Code · Text-to-3D · Image-to-3D

Relightable Gaussian Codec Avatars

Relightable Gaussian Codec Avatars can generate high-quality, relightable 3D head avatars that show fine details like hair strands and pores. They work well in real-time under different lighting conditions and are optimized for consumer VR headsets.

06.12.23 · Project Page · Code · 3D Avatar Generation