AI Toolbox | AI Art Weekly

Latent Consistency Models

Latent Consistency Models can generate high-resolution images in just 2-4 steps, making text-to-image generation much faster than traditional methods. They require only 32 A100 GPU hours for training on a 768x768 resolution, which is efficient for high-quality results.

06.10.23 · Project Page · Code · Text-to-Image

DREAM

DREAM can reconstruct images seen by a person from their brain activity using an fMRI-to-image method. It decodes important details like color and depth, and it performs better than other models in keeping the appearance and structure consistent.

03.10.23 · Project Page · Code · Brain-to-3D

HumanNorm

HumanNorm is a novel approach for high-quality and realistic 3D human generation by leveraging normal maps which enhances the 2D perception of 3D geometry. The results are quite impressive and comparable with PS3 games.

02.10.23 · Project Page · Code · Text-to-3D · 3D Object Generation

Ground-A-Video

Ground-A-Video can edit multiple attributes of a video using pre-trained text-to-image models without any training. It maintains consistency across frames and accurately preserves non-target areas, making it more effective than other editing methods.

02.10.23 · Project Page · Code · Video Editing

Controlling Vision-Language Models for Multi-Task Image Restoration

DA-CLIP is a method that can be used to restore images. Apart from inpainting, the method is able to restore images by dehazing, deblurring, denoising, derainining and desnowing them as well as removing unwanted shadows and raindrops or enhance lighting on low-light images.

02.10.23 · Project Page · Code · Image Restoration

PixArt-$α$

PIXART-α can generate high-quality images at a resolution of up to 1024px. It reduces training time to 10.8% of Stable Diffusion v1.5, costing about $26,000 and emitting 90% less CO2.

30.09.23 · Project Page · Code · Text-to-Image

LLM-grounded Video Diffusion Models

LLM-grounded Video Diffusion Models can generate realistic videos from complex text prompts. They first create dynamic scene layouts with a large language model, which helps guide the video creation process, resulting in better accuracy for object movements and actions.

29.09.23 · Project Page · Code · Text-to-Video

DreamGaussian

DreamGaussian can generate high-quality textured meshes from a single-view image in just 2 minutes. It uses a 3D Gaussian Splatting model for fast mesh extraction and texture refinement.

28.09.23 · Project Page · Code · 3D Object Generation · 3D Mesh Generation · Image-to-3D

Deep Geometrized Cartoon Line Inbetweening

AnimeInbet is a method that is able to generate inbetween frames for cartoon line drawings. Seeing this, we’ll hopefully be blessed with higher framerate animes in the near future.

28.09.23 · Code · Image-to-Image

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation can generate diverse and realistic videos that match natural audio samples. It uses a lightweight adaptor network to improve alignment and visual quality compared to other methods.

28.09.23 · Code · Audio-to-Video · Text-to-Video

Show-1

Show-1 can generate high-quality videos with accurate text-video alignment. It uses only 15G of GPU memory during inference, which is much less than the 72G needed by traditional models.

27.09.23 · Project Page · Code · Text-to-Video

PGDiff

PGDiff can restore and colorize faces from low-quality images by using details from high-quality images. It effectively fixes issues like scratches and blurriness.

19.09.23 · Code · Image Restoration · Image Inpainting

Generative Repainting

Generative Repainting can paint 3D assets using text prompts. It uses pretrained 2D diffusion models and 3D neural radiance fields to create high-quality textures for various 3D shapes.

15.09.23 · Project Page · Code · 3D Editing · 3D Style Transfer · 3D Texture Generation

TECA

TECA can generate realistic 3D avatars from text descriptions. It combines traditional 3D meshes for faces and bodies with neural radiance fields (NeRF) for hair and clothing, allowing for high-quality, editable avatars and easy feature transfer between them.

13.09.23 · Project Page · Code · Text-to-3D · 3D Editing · 3D Avatar Generation

InstaFlow

InstaFlow can generate high-quality images in just one step, achieving an FID of 23.3 on MS COCO 2017-5k. It works very fast at about 0.09 seconds per image, using much less computing power than traditional diffusion models.

12.09.23 · Code · Text-to-Image

ProPainter

ProPainter is a new video inpainting method that is able to remove objects, complete masked videos, remove watermarks and even expand the view of a video.

07.09.23 · Project Page · Code · Video Inpainting

Reuse and Diffuse

Another video synthesis model that caught my eye this week is Reuse and Diffuse. The novel framework for text-to-video generation adds the ability to generate more frames from an initial video clip by reusing and iterating over the original latent features. Can’t wait to give this one a try.

07.09.23 · Project Page · Code · Text-to-Video

SyncDreamer

SyncDreamer is able to generate multiview-consistent images from a single-view image and thus is able to generate 3D models from 2D designs and hand drawings. It wasn’t able to help me in my quest to turn my PFP into a 3D avatar, but someday I’ll get there!

07.09.23 · Project Page · Code · Image-to-3D

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Hierarchical Masked 3D Diffusion Model for Video Outpainting can fill in missing parts at the edges of video frames while keeping the motion smooth. It uses a smart method that reduces errors and improves results by looking at multiple frames.

05.09.23 · Project Page · Code · Video Inpainting · Video Editing

Total Selfie

[Total Selfie] can generate high-quality full-body selfies from close-up selfies and background images. It uses a diffusion-based approach to combine these inputs, creating realistic images in desired poses and overcoming the limits of traditional selfies.

28.08.23 · Project Page · Code · Image-to-Image · Image Editing