AI Toolbox | AI Art Weekly

Reduce, Reuse, Recycle

Reduce, Reuse, Recycle can enable compositional generation using energy-based diffusion models and MCMC samplers. It improves tasks like classifier-guided ImageNet modeling and text-to-image generation by introducing new samplers that enhance performance.

22.02.23 · Project Page · Code · Text-to-Image · Image Classification

Entity-Level Text-Guided Image Manipulation

Entity-Level Text-Guided Image Manipulation can edit specific parts of an image based on text descriptions while keeping other areas unchanged. It uses a two-step process for aligning meanings and making changes, allowing for flexible and precise editing.

22.02.23 · Project Page · Code · Image Editing

T2I-Adapter

[Tool Name] can [main function/capability]. It [key detail 1] and [key detail 2].

16.02.23 · Code · Demo · Text-to-Image · Image Editing · Controllable Image Generation

MultiDiffusion

MultiDiffusion can generate high-quality images using a pre-trained text-to-image diffusion model. It allows users to control aspects like image size and includes features for guiding images with segmentation masks and bounding boxes.

16.02.23 · Project Page · Code · Demo · Controllable Image Generation · Image Segmentation

Video Probabilistic Diffusion Models in Projected Latent Space

[Projected Latent Video Diffusion Models (PVDM)] can generate high-resolution and smooth videos in a low-dimensional space. It achieves a top score of 639.7 on the UCF-101 benchmark, greatly surpassing previous methods.

15.02.23 · Project Page · Code · Personalized Video Generation

Single Motion Diffusion

Single Motion Diffusion can generate realistic animations from one input motion sequence. It allows for motion expansion, style transfer, and crowd animation, while using a lightweight design to create diverse motions efficiently.

12.02.23 · Project Page · Code · 3D Object Generation

Adding Conditional Control to Text-to-Image Diffusion Models

ControlNet can add control to text-to-image diffusion models. It lets users manipulate image generation using methods like edge detection and depth maps, while working well with both small and large datasets.

10.02.23 · Code · Text-to-Image · Controllable Image Generation

Neural Congealing

Neural Congealing can align similar content across multiple images using a self-supervised method. It uses pre-trained DINO-ViT features to create a shared semantic map, allowing for effective alignment even with different appearances and backgrounds.

08.02.23 · Project Page · Code · Image Segmentation · Image Editing

Hard Prompts Made Easy

Hard Prompts Made Easy can automatically generate and optimize hard text-based prompts for text-to-image and text-to-text applications. It helps users tune models for classification and create image concepts without needing prior prompting knowledge, using efficient gradient-based optimization.

07.02.23 · Project Page · Code · Demo · Text-to-Image · Text-to-Text

Zero-shot Image-to-Image Translation

Pix2Pix-Zero can edit images by changing them in real-time, like turning a cat into a dog, without needing extra text prompts or training. It keeps the original image’s structure and uses pre-trained text-to-image diffusion models for better editing results.

06.02.23 · Project Page · Code · Image-to-Image · Image Editing

TEXTure

TEXTure can generate and edit seamless textures for 3D shapes using text prompts. It uses a depth-to-image diffusion model to create consistent textures from different angles and allows for refinement based on user input.

03.02.23 · Project Page · Code · Demo · 3D Editing · 3D Texture Generation

SceneDreamer

SceneDreamer can generate endless 3D scenes from 2D image collections. It creates photorealistic images with clear depth and allows for free camera movement in the environments.

02.02.23 · Project Page · Code · Image-to-3D

Dreamix

Dreamix can edit videos based on a text prompt while keeping colors, sizes, and camera angles consistent. It combines low-resolution video data with high-quality content, allowing for advanced editing of motion and appearance.

02.02.23 · Project Page · Code · Text-to-Video · Video Editing

SceneScape

SceneScape can generate long videos of different scenes from text prompts and camera angles. It ensures 3D consistency by building a unified mesh of the scene, allowing for realistic walkthroughs in places like spaceships and caves.

02.02.23 · Project Page · Code · Text-to-Video

Shape-aware Text-driven Layered Video Editing

Shape-aware Text-driven Layered Video Editing can edit the shape of objects in videos while keeping them consistent across frames. It uses a text-conditioned diffusion model to achieve this, making video editing more effective than other methods.

30.01.23 · Project Page · Code · Text-to-Video · Video Editing

StyleGAN-T

StyleGAN-T can generate high-quality images at 512x512 resolution in just 2 seconds using a single NVIDIA A100 GPU. It solves problems in text-to-image synthesis, like stable training on diverse datasets and strong text alignment.

23.01.23 · Project Page · Code · Text-to-Image · Controllable Image Generation

RecolorNeRF

RecolorNeRF can change colors in 3D scenes while keeping the view consistent. It breaks scenes into pure-colored layers, allowing for easy color adjustments and producing realistic results that are better than other methods.

19.01.23 · Project Page · Code · 3D Editing · 3D Style Transfer

Msanii

Msanii can create high-quality music tracks up to 190 seconds long at a sample rate of 44.1 kHz. It uses a diffusion-based method to combine mel spectrograms and neural vocoders, allowing for audio-to-audio style transfer and smooth transitions between audio samples.

16.01.23 · Project Page · Code · Audio Generation

Robust Dynamic Radiance Fields

Robust Dynamic Radiance Fields can estimate both static and dynamic radiance fields along with camera settings. It improves view synthesis from difficult videos, achieving better quality and accuracy than current top methods.

05.01.23 · Project Page · Code · 3D Scene Generation · 3D Object Detection

Tune-A-Video

Tune-A-Video can generate videos from a single text-video pair by fine-tuning text-to-image diffusion models. It lets users change subjects, backgrounds, and styles while keeping the video content consistent.

22.12.22 · Project Page · Code · Text-to-Video