AI Art Weekly #81

Hello there, my fellow dreamers, and welcome to issue #81 of AI Art Weekly! πŸ‘‹

Progress is unavoidable, so we march into a world where robots assert dominance while delivering packages and learn how to execute our daily chores. All the while, the veil of what is real becomes blurrier by the day. Still, we march on.

I’m personally taking a break from marching on. I’ll be back in two weeks with the next issue. Until then, enjoy this packed one!

In this issue:

  • 3D: PhysDreamer, GScream, NeRF-XL, Interactive3D, Make-it-Real, TELA, TokenHMR
  • Image: Midjourney, Hyper-SD, ConsistentID, PuLID, MultiBooth, ID-Aligner, CharacterFactory, TF-GPH, Editable Image Elements, IDM-VTON
  • Video: MaGGIe, MotionMaster, SVA
  • and more!

Cover Challenge 🎨

Theme: water
120 submissions by 72 artists
AI Art Weekly Cover Art Challenge water submission by pactalom
πŸ† 1st: @pactalom
AI Art Weekly Cover Art Challenge water submission by ManoelKhan
πŸ₯ˆ 2nd: @ManoelKhan
AI Art Weekly Cover Art Challenge water submission by SandyDamb
πŸ₯ˆ 2nd: @SandyDamb
AI Art Weekly Cover Art Challenge water submission by Saudade_nft
πŸ₯‰ 3rd: @Saudade_nft

News & Papers


PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

PhysDreamer is a physics-based approach that enables you to poke, push, pull and throw objects in a virtual 3D environment and they will react in a physically plausible manner.

PhysDreamer example

GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

GScream is yet another method for object removal in 3D scenes. This one uses Gaussian Splatting to update the radiance field and is able to preserve geometric consistency and texture coherence.

GScream example

NeRF-XL: NeRF at Any Scale with Multi-GPU

But enough about Gaussians. NeRF-XL by NVIDIA is a new method for distributing NeRFs across multiple GPUs, enabling training and rendering 3D scenes of arbitrarily large capacity.

NeRF-XL examples of arbitrary scale

Interactive3DπŸͺ„: Create What You Want by Interactive 3D Generation

Of course we aren’t short of 3D object generation methods this week. Interactive3D allows users to interactively modify and guide the generative process of 3D objects. This includes adding and removing components, deforming and rigid dragging, geometric transformations, and semantic editing.

Interactive3D example of generating and modifying a Gundam Robot

Make-it-Real: Unleashing Large Multimodal Model’s Ability for Painting 3D Objects with Realistic Materials

AI will make material creation a breeze for 3D artists! Make-it-Real utilizes GPT-4V to recognize and describe materials, allowing the construction of a detailed material library. The model can then precisely identify and align materials with the corresponding components of 3D objects and apply them as reference for new SVBRDF material generation, significantly enhancing their visual authenticity.

Make-it-Real example

TELA: Text to Layer-wise 3D Clothed Human Generation

TELA can create 3D models of people wearing clothes based on text descriptions. It allows you to precisely control how the clothes appear on the model, including which layers go on first.

TELA examples

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

And on the pose reconstruction front we have had TokenHMR, which can extract human poses and shapes from a single image.

TokenHMR example


Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

ByteDance released Hyper-SD this week, yet another diffusion-aware distillation algorithm that brings high-quality image generation down to one inference step.

Real-Time Generation Demo of Hyper-SD

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

This week we’ve been blessed with not only image personalization method, but 4. We begin with ConsistentID which can generate diverse personalized ID images based on text prompts using only a single image.

ConsistentID examples

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Similar to ConsistentID, PuLID is a tuning-free ID customization method for text-to-image generation. This one can also be used to edit images generated by diffusion models by adding or changing the text prompt.

PuLID examples

MultiBooth: Towards Generating All Your Concepts in an Image from Text

MultiBooth on the other hand can generate images that include any number of concepts in various styles, contexts, and layout relationships as specified by given text prompts.

MultiBooth examples

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

ID-Aligner is able to improve identity preservation and the visual appeal of generated images and can be applied to both LoRA and Adapter models.

ID-Aligner examples

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

The GAN CharacterFactory is coming. The thing can create infinite identity-consistent new characters and is compatible with models across multiple modalities like ControlNet for images, ModelScope for videos as well as LucidDreamer for 3D objects.

CharacterFactory examples

TF-GPH: Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing

TF-GPH can blend images with disparate visual elements together stylistically!

TF-GPH examples

Editable Image Elements for Controllable Synthesis

Editable Image Elements can edit the location and size of objects in an input image and then generate a new image that respects the modifications. This can be used to resize, rearrange, drag, remove, and create variations of objects in an image, as well as compose multiple images together.

Editable Image Elements example

IDM-VTON: A New Baseline for Virtual Try-On

IDM-VTON can generate high-quality images of people wearing clothes that are not only realistic, but also preserve the original design of the garment. The method can be used to create virtual fitting rooms, improve online shopping experiences, and even generate fashion designs.

IDM-VTON example


MaGGIe: Mask Guided Gradual Human Instance Matting

MaGGIe can efficiently predict high-quality human instance mattes from coarse binary masks for both image and video input. The method is able to output all instance mattes simultaneously without exploding memory and latency, making it suitable for real-time applications.

MaGGIe example

MotionMaster: Training-free Camera Motion Transfer For Video Generation

MotionMaster can extract camera motions from a single source video or multiple videos and apply them to new videos. This enables the model to control camera motions in a more flexible and controllable way, resulting in videos with variable-speed zoom, pan left, pan right, dolly zoom in, dolly zoom out and more.

MotionMaster Dolly-Zoom-In example

SVA: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model

SVA can generate sound effects and background music for videos based on a single key frame and a text prompt.

Check the project page for SVA examples with audio

Also interesting

  • DMesh: A Differentiable Representation for General Meshes
  • GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting

β€œWhat is this πŸ’©β€ by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa