AI Art Weekly #71

Hello there, my fellow dreamers, and welcome to issue #71 of AI Art Weekly! 👋

I was so close to releasing a preview of Shortie this week, but then OpenAI dropped Sora and I had to cover that instead 😅 So without further ado, let’s jump into this weeks issue. The highlights are:

  • OpenAI’s Sora
  • Stable Cascade, a faster and more efficient text-to-image model
  • Magic-Me generates videos with a specified subject identity
  • Continuous 3D Words controls attributes in images
  • GALA3D generates complex 3D scenes from text
  • HeadStudio generates animatable head avatars
  • AudioEditing allows for zero-shot and text-based audio editing
  • Sophia-in-Audition uses a robot performer in virtual production
  • Interview with artist Grebenshyo
  • and more!

Cover Challenge 🎨

Theme: cabal
82 submissions by 48 artists
AI Art Weekly Cover Art Challenge cabal submission by NomadsVagabonds
🏆 1st: @NomadsVagabonds
AI Art Weekly Cover Art Challenge cabal submission by samisantosai
🥈 2nd: @samisantosai
AI Art Weekly Cover Art Challenge cabal submission by sourpowww3r
🥉 3rd: @sourpowww3r
AI Art Weekly Cover Art Challenge cabal submission by EternalSunrise7
🧡 4th: @EternalSunrise7

News & Papers

OpenAI’s Sora

OpenAI shook the world again this week. They presented Sora, a generative video AI model that can create realistic and imaginative scenes from text prompts. Just reading that doesn’t sound like anything new, until you see the results. Like holy smokes.

I posted a summary of all its capabilities over on X.

Besides the mind-blowing results, the most interesting aspect to me is that the model learned to simulate some aspects of people, animals and environments from the physical world without explicitly being trained for 3D and objects. The more data it got fed, the more it learned about the world. It even learned how a player behaves when generating Minecraft videos 🤯 Gonna be interesting to see how this is going to evolve!

This isn’t a real video 🤯

Stable Cascade: Stable Diffusion meets Würstchen

Stable Cascade is a new text-to-image model by Stability AI that is built upon the Würstchen architecture. Due to its more compressed latent space, it can be trained quicker and generate images faster compared to models like SDXL. Best of all: all known extensions like finetuning, LoRA, ControlNet, IP-Adapter, LCM are possible. Seems like a great time to get into Stable Diffusion.

Images generate with Stable Cascade


It’s hard to follow-up with video models after Sora, but this is where we are at until the rest of the world catches up.

Magic-Me is a video generation model that is able to generate videos with a specified subject identity defined by a few images. The model is also able to deblur faces and upscale videos for higher resolution.

Magic-Me example using the subject identity of Taylor Swift 👀

Continuous 3D Words for Text-to-Image Generation

Continuous 3D Words is a new control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.

3D Words in action

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.

A living room has a coffee table with a basket on it, a wooden floor, a TV on a TV stand, and a sofa with an astronaut sitting on it

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

HeadStudio is another text-to-3D avatar model that can generate animatable head avatars. The method is able to produce high-fidelity avatars with smooth expression deformation and real-time rendering.

HeadStudio examples

AudioEditing: Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

AudioEditing are two new methods for editing audio. The first technique allows for text-based editing, while the second is an approach for discovering semantically meaningful editing directions without supervision.

AudioEditing preview, checkout the examples on the project page

Sophia-in-Audition: Virtual Production with a Robot Performer

Sophia-in-Audition is a system uses the humanoid robot Sophia as a virtual performer inside an UltraStage, which is a controllable lighting dome coupled with multiple cameras. The result is a virtual actor that can replicate iconic film segments, follow real performers, and perform a variety of motions and expressions, all while being able to control lighting and camera movements.

First of all, Sophia creeps me out, second of all, I think with all the progress in AI motion capturing and now with world-models like Sora on the horizon, this is already obsolete again. At least for movie production.

But still fun to learn about.

Sophia as Michael Corleone

Also interesting

  • ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents


Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

ZAAR” by me available on objkt

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa