AI Art Weekly #86

Hello there, my fellow dreamers, and welcome to issue #86 of AI Art Weekly! 👋

What an insane week it has been! People say AI research is slowing down and we’re reaching a capabilities plateau, but I beg to differ. This week I’ve gone through another 180+ papers and projects from the world of computer vision and AI art, lets jump right into it!

In this weeks issue:

  • Highlights: Luma AI: Dream Machine, Stable Diffusion 3 Medium, Midjourney Personalization
  • 3D: M-LRM, Human 3Diffusion, WonderWorld, LE3D, AvatarPopUp, IllumiNeRF, GGHead, StableMaterials
  • Image: Eye-for-an-eye, Image Neural Field Diffusion Models, Neural Gaffer, Ctrl-X, FontStudio, Layered Image Vectorization, LLamaGen, MimicBrush, CFG++, AsyncDiff, EMMA
  • Video: HOI-Swap, T2S-GPT
  • Audio: Action2Sound
  • and more!

Cover Challenge 🎨

Theme: brutalism
58 submissions by 33 artists
AI Art Weekly Cover Art Challenge brutalism submission by ManoelKhan
🏆 1st: @ManoelKhan
AI Art Weekly Cover Art Challenge brutalism submission by PapaBeardedNFTs
🥈 2nd: @PapaBeardedNFTs
AI Art Weekly Cover Art Challenge brutalism submission by pactalom
🥉 3rd: @pactalom
AI Art Weekly Cover Art Challenge brutalism submission by EternalSunrise7
🧡 4th: @EternalSunrise7

News & Papers

Highlights

Luma AI: Dream Machine

Luma AI had the AI art community buzzing this week with their new video genration model called Dream Machine. Compared to Sora, it’s the most advanced video generation model that you can access today. It is able to generate 5 second clips with 120 frames in 120 seconds. I’ve compiled a list of some cool creations from the community over on X.

Dream Machine example by @NathanBoey

Stable Diffusion 3 Medium

Stability AI finally released Stable Diffusion 3 weights, well, kind of. They released a new model called Stable Diffusion 3 Medium which is a smaller version of the original model they showcased a few weeks back. The new model also supports overall quality improvements in photorealism, prompt understanding and can generate text in images. Although human anatomy is still an issue apparently.

SD3 Medium examples

Midjourney Personalization

Midjourney released a new personalization feature that completely changes the way MJ interprets your prompts. For it to work, you have to rank at least 200 images and then add the --p flag to the end of your prompts. I’m already having a ton of fun with it.

Midjourney output with my personalization code --personalize 9zk3pun

3D

M-LRM: Multi-view Large Reconstruction Model

M-LRM is yet another model that can reconstruct high-quality 3D shapes from either a single or multiple images.

M-LRM examples

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

Human 3Diffusion can reconstruct realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance.

Human 3Diffusion example

WonderWorld: Interactive 3D Scene Generation from a Single Image

WonderWorld can generate interactive 3D scenes from a single image and a text prompt in less than 10 seconds on a single A6000 GPU.

WonderWorld example

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

LE3D can turn noisy RAW images into a Gaussian Splat and perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes.

LE3D example

Instant 3D Human Avatar Generation using Image Diffusion Models

GoogleMind’s AvatarPopUp can generate high-quality rigged 3D human avatars from a single image or text prompt in as few as 2 seconds.

Animated AvatarPopUp example

IllumiNeRF: 3D Relighting without Inverse Rendering

Also by Google, IllumiNeRF can relight images. The methods uses an image diffusion model conditioned on lighting and then reconstructs a NeRF with these relit images, from which it can render novel views under the target lighting.

IllumiNeRF example

GGHead: Fast and Generalizable 3D Gaussian Heads

GGHead can generate and render 3D heads at 1K resolution in real-time.

GGHead example

StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

StableMaterials an generate high-resolution tileable PBR materials from text prompts or input images in just 4 diffusion steps.

StableMaterials examples

Image

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

Eye-for-an-eye makes it possible for diffusion models to transfer the appearance of objects from a reference image to a target image.

Eye-for-an-eye example

Image Neural Field Diffusion Models

Image Neural Field Diffusion Models can be used to train diffusion models on image neural fields, which can be rendered at any resolution. This makes it possible to train diffusion models using mixed-resolution image datasets.

CLIF render example at a 2048×2048 resolution

Neural Gaffer: Relighting Any Object via Diffusion

Neural Gaffer can relight any object in an image under any novel environmental lighting condition by simply conditioning an image generator on a target environment map.

Neural Gaffer examples

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Ctrl-X enables structure and appearance control for text-to-image and text-to-video models that with any image as input! This makes it possible to generate images and videos with the structure of one image and the appearance of another.

Ctrl-X examples

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

FontStudio can generate text effects for multilingual fonts. The model is able to interpret the given shape of a font and strategically plan pixel distributions within the irregular canvas.

FontStudio examples

Layered Image Vectorization via Semantic Simplification

Layered Vectorization can turn images into layered vectors that represent the original image from coarse to fine detail levels.

Layered Vectorization examples

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

LLamaGen is a new family of image generation models that are based on the same approach as LLMs. The largest model has 3.1B parameters and is able to generate 256x256 images.

LLamaGen examples

Zero-shot Image Editing with Reference Imitation

MimicBrush can edit an image region of interest by drawing inspiration from a reference image by capturing the semantic correspondence between separate images in a self-supervised manner.

MimicBrush examples

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

CFG++ fixes CFG’s issues with lower guidance scales, improving text-to-image quality and invertibility.

CFG++ comparison

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

AsyncDiff brings parallelism to diffusion model which results in a significant boost in inference latency while minimally impacting the generative quality.

2.8x Faster on SDXL with 4 devices. Top: 50 step original (13.81s). Bottom: 50 step AsyncDiff (4.98s)

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

EMMA is a new image generation model that can generate images from text prompts and additional modalities such as reference images or portraits. It especially shines at preserving individual identities.

EMMA examples

Video

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

HOI-Swap can swap objects in videos with a focus on those interacted with by hands, given one user-provided reference object image.

HOI-Swap examples

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

T2S-GPT can generate sign language videos from text and is able to control the speed of the signing.

T2S-GPT example

Audio

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Action2Sound can generate realistic action sounds for human interactions in videos. The model is able to disentangle foreground action sounds from the ambient background sounds and can even generate ambient sounds for silent videos.

Action2Sound illustration

Also interesting

  • Visual Words: Understanding Visual Concepts Across Models
  • CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
  • Weights2Weights: Interpreting the Weight Space of Customized Diffusion Models
  • CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
  • AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
  • MCM: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

“up/above IV” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa