AI Toolbox | AI Art Weekly

3DTopia-XL can generate high-quality 3D PBR assets from text or image inputs in just 5 seconds.

20.09.24 · Project Page · Code · Text-to-3D · Image-to-3D · 3D Object Generation

StableSR

Exploiting Diffusion Prior for Real-World Image Super-Resolution can restore high-quality images from low-resolution inputs using pre-trained text-to-image diffusion models. It allows users to balance image quality and fidelity through a controllable feature wrapping module and adapts to different image resolutions with a progressive aggregation sampling strategy.

19.09.24 · Project Page · Code · Image Restoration · Image Upscaling

Upscale-A-Video

Upscale-A-Video can upscale low-resolution videos using text prompts while keeping the video stable. It allows users to adjust noise levels for better quality and performs well in both test and real-world situations.

19.09.24 · Project Page · Code · Video Upscaling

Expressive Whole-Body 3D Gaussian Avatar

ExAvatar can animate expressive whole-body 3D human avatars from a short monocular video. It captures facial expressions, hand motions, and body poses in the process.

17.09.24 · Project Page · Code · 3D Object Generation · 3D Motion Generation

SO-SMPL

Disentangled Clothed Avatar Generation from Text Descriptions can create high-quality 3D avatars by separately modeling human bodies and clothing. This method improves texture and geometry quality and aligns well with text prompts, enhancing virtual try-on and character animation.

17.09.24 · Project Page · Code · Text-to-3D

MagicMan

MagicMan can generate high-quality 3D images and normal maps of humans from a single photo.

16.09.24 · Project Page · Code · Image-to-Image

TurboEdit

TurboEdit enables fast text-based image editing in just 3-4 diffusion steps! It improves edit quality and preserves the original image by using a shifted noise schedule and a pseudo-guidance approach, tackling issues like visual artifacts and weak edits.

14.09.24 · Project Page · Code · Image Editing

DepthCrafter

DepthCrafter can generate long high-quality depth map sequences for videos. It uses a three-stage training method with a pre-trained image-to-video diffusion model, achieving top performance in depth estimation for visual effects and video generation.

14.09.24 · Project Page · Code · Video Depth Estimation

DreamBeast

DreamBeast can generate unique 3D animal assets with different parts. It uses a method from Stable Diffusion 3 to quickly create detailed Part-Affinity maps from various camera views, improving quality while saving computing power.

13.09.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

DrawingSpinUp

DrawingSpinUp can animate 3D characters from a single 2D drawing. It removes unnecessary lines and uses a skeleton-based algorithm to allow characters to spin, jump, and dance.

13.09.24 · Project Page · Code · Image-to-3D · 3D Object Generation

DreamHOI

DreamHOI can generate realistic 3D human-object interactions (HOIs) by posing a skinned human model to interact with objects based on text descriptions. It uses text-to-image diffusion models to create diverse interactions without needing large datasets.

13.09.24 · Project Page · Code · Text-to-3D · 3D Object Generation

TextBoost

TextBoost can enable one-shot personalization of text-to-image models by fine-tuning the text encoder. It generates diverse images from a single reference image while reducing overfitting and memory needs.

12.09.24 · Project Page · Code · Personalized Image Generation

ProbTalk3D

ProbTalk3D can generate 3D facial animations that show different emotions based on audio input! It uses a two-stage VQ-VAE model and the 3DMEAD dataset, allowing for diverse facial expressions and accurate lip-syncing.

12.09.24 · Project Page · Code · Audio-to-3D

World-Grounded Human Motion Recovery via Gravity-View Coordinates

GVHMR can recover human motion from monocular videos by estimating poses in a Gravity-View coordinate system aligned with gravity and the camera.

11.09.24 · Project Page · Code · Video Object Detection · Video Analysis

NeuroPictor

NeuroPictor can improve fMRI-to-image reconstruction by using fMRI signals to control diffusion models. It is trained on over 67,000 fMRI-image pairs, allowing for better accuracy in generating images that reflect both high-level concepts and fine details.

10.09.24 · Project Page · Code · Brain-to-Image

Text2Place

Text2Place can place any human or object realistically into diverse backgrounds. This enables scene hallucination by generating compatible scenes for the given pose of the human, text-based editing of the human and placing multiple persons into a scene.

10.09.24 · Project Page · Code · Image-to-Image · Image Segmentation

One-Shot Diffusion Mimicker for Handwritten Text Generation

One-DM can generate handwritten text from a single reference sample, mimicking the style of the input. It captures unique writing patterns and works well across multiple languages.

09.09.24 · Code · Text-to-Image

FlexiClip

FlexiClip can generate smooth animations from clipart images while keeping key points in the right place.

08.09.24 · Project Page · Code · Image-to-Video

Distilling Diffusion Models into Conditional GANs

Diffusion2GAN is a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference while preserving image quality. This enables one-step 512px/1024px image generation at an interactive speed of 0.09/0.16 second as well as 4k image upscaling!

05.09.24 · Project Page · Code · Image-to-Image

LinFusion

LinFusion can generate high-resolution images up to 16K in just one minute using a single GPU. It improves performance on various Stable Diffusion versions and works with pre-trained components like ControlNet and IP-Adapter.

04.09.24 · Project Page · Code · Demo · Controllable Image Generation