AI Toolbox | AI Art Weekly

Reference-based Image Composition with Sketch via Structure-aware Diffusion Model

[Reference-based Image Composition with Sketch via Structure-aware Diffusion Model] can edit images by filling in missing parts using a reference image and a sketch. This method improves editability and allows for detailed changes in various scenes.

31.03.23 · Code · Image Editing · Image Inpainting

AvatarCraft

AvatarCraft can turn a text prompt into a high-quality 3D human avatar. It allows users to control the avatar’s shape and pose, making it easy to animate and reshape without retraining.

30.03.23 · Project Page · Code · Text-to-3D · 3D Object Generation

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

vid2vid-zero can edit videos without needing extra training on video data. It uses image diffusion models for text-to-video alignment and keeps the original video’s look and feel, allowing for effective changes to scenes and subjects.

30.03.23 · Code · Demo · Video Editing

PAIR-Diffusion

PAIR Diffusion is a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in an image. This allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.

30.03.23 · Project Page · Code · Image Editing · Image Segmentation

HyperDiffusion

HyperDiffusion can generate high-quality 3D shapes and 4D mesh animations using a unified diffusion model. This method allows for the creation of complex objects and dynamic scenes from a single framework, making it versatile and efficient.

29.03.23 · Project Page · Code · 3D Object Generation · 3D Scene Generation

PAniC-3D

PAniC-3D can reconstruct 3D character heads from single-view anime portraits. It uses a line-filling model and a volumetric radiance field, achieving better results than previous methods and setting a new standard for stylized reconstruction.

25.03.23 · Code · Image-to-3D · 3D Object Generation

High-Resolution Image Synthesis with Latent Diffusion Models

LDMs are high-resolution image generators that can inpaint, generate images from text or bounding boxes, and do super-resolution.

25.03.23 · Project Page · Code · Image Inpainting · Personalized Image Generation · Image Restoration

Make-It-3D

Make-It-3D can create high-quality 3D content from a single image by estimating 3D shapes and adding textures. It uses a two-step process with a trained 2D diffusion model, allowing for text-to-3D creation and detailed texture editing.

24.03.23 · Project Page · Code · Image-to-3D

eDiff-I

eDiff-I can generate high-resolution images from text prompts using different diffusion models for each stage. It also allows users to control image creation by selecting and moving words on a canvas.

24.03.23 · Code · Text-to-Image

Text2Video-Zero

Text2Video-Zero can generate high-quality videos from text prompts using existing text-to-image diffusion models. It adds motion dynamics and cross-frame attention, making it useful for conditional video generation and instruction-guided video editing.

23.03.23 · Code · Text-to-Video

Vox-E

Vox-E can edit 3D objects by changing their shape and appearance based on text prompts. It uses a special method to keep the edited object connected to the original, allowing for both big and small changes.

21.03.23 · Project Page · Code · 3D Editing

MeshDiffusion

MeshDiffusion can generate realistic 3D meshes using a score-based diffusion model with deformable tetrahedral grids. It is great for creating detailed 3D shapes from single images and can also add textures, making it useful for various applications.

14.03.23 · Project Page · Code · 3D Object Generation · 3D Scene Generation

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

Blind Video Deflickering by Neural Filtering with a Flawed Atlas can remove flicker from videos without needing extra guidance. It works well on different types of videos and uses a neural atlas for better consistency, outperforming other methods.

14.03.23 · Project Page · Code · Video Restoration · Video Editing

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

3DFuse can improve 3D scene generation by adding 3D awareness to 2D diffusion models. It builds a rough 3D structure from text prompts and uses depth maps for better realism in reconstructions.

14.03.23 · Project Page · Code · Text-to-3D

3D Cinemagraphy from a Single Image

3D Cinemagraphy can turn a single still image into a video by adding motion and depth. It uses 3D space to create realistic animations and fix common issues like artifacts and inconsistent movements.

10.03.23 · Project Page · Code · Image-to-Video

X-Avatar

X-Avatar can capture the full expressiveness of digital humans for lifelike experiences in telepresence and AR/VR. It uses full 3D scans or RGB-D data and outperforms other methods in animation tasks, supported by a new dataset with 35,500 high-quality frames.

08.03.23 · Project Page · Code · 3D Object Generation · 3D Animation · 3D Avatar Generation

Video-P2P

Video-P2P can edit videos using advanced techniques like word swap and prompt refinement. It adapts image generation models for video, allowing for the creation of new characters while keeping original poses and scenes.

08.03.23 · Project Page · Code · Video Editing

Human Motion Diffusion as a Generative Prior

PriorMDM can generate long human motion sequences of up to 10 minutes using a pre-trained diffusion model. It allows for controlled transitions between prompted intervals and can create two-person motions with just 14 training examples, using techniques like DiffusionBlending for better control.

02.03.23 · Project Page · Code · Text-to-Motion · Motion Generation

Key-Locked Rank One Editing for Text-to-Image Personalization

100kb models? Combining muliple individually learned concepts? 1-shot Personalization? Key-Locking? Perfusion just might be a new viable Stable Diffusion fine-tuning method by NVIDIA. No way to try it out yet, as there is as usual no code, but I’m keeping an eye on this one.

27.02.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models can quickly personalize text-to-image models using just one image and only 5 training steps. This method reduces training time from minutes to seconds while maintaining quality through regularized weight-offsets.

23.02.23 · Code · Text-to-Image