AI Art Weekly #26

Hello there my fellow dreamers and welcome to issue #26 of AI Art Weekly! 👋

Each week just topples the week that came before somehow at the moment. I wonder where we are on the S-Curve in regards to AI innovation? If we’re just starting out, we’re in for a crazy ride… and now it seems we’ve officially entered the age of AI video. People compare the new ModelScope model to the Dalle mini and if progress is continuing with video the same way as it did with images, this will improve fast! So let’s take a look, this week highlights are:

  • Adobe Firefly brings generative art to the masses
  • ModelScope Text-to-Video now available
  • RunwayML Gen-2 Text-To-Video announced
  • Dreambooth3D
  • Interview with AI artist Puff
  • ChatGPT Plugins
  • And much much much more

Cover Challenge 🎨

Theme: food
62 submissions by 36 artists
AI Art Weekly Cover Art Challenge food submission by BonsaiFox1
🏆 1st: @BonsaiFox1
AI Art Weekly Cover Art Challenge food submission by moelucio
🥈 2nd: @moelucio
AI Art Weekly Cover Art Challenge food submission by ghostmoney1111
🥉 3rd: @ghostmoney1111
AI Art Weekly Cover Art Challenge food submission by iminescent
🥉 3rd: @iminescent

Reflection: News & Gems

Adobe Firefly

Let’s start with the obvious one, Adobe announced Firefly which will integrate generative AI tools directly into their products. Which means text-to-image, image-to-image, inpainting and outpainting, upscaling, fine-tuning, sketch and depth map guidance and some new additions like text-to-template, text-to-brush, text-to-pattern and text-to-vector will soon be available to millions of people in way more user friendly interfaces compared to what we got Today.

You can signup for the Firefly beta here.

Adobe Firefly Text-to-Vector example

First ModelScope implementations

I already mentioned the ModelScope text-to-video model in issue #23 three weeks ago, now we’re seeing first implementations. And while they’re quite janky, they’re super fun. For example here is Darth Vader in Walmart or Trump and Biden as a sitcom duo. I also created my own one inspired by Twin Peaks of Dale Cooper raving about Coffee.

If you want to give this a try yourself, you have a few options:

  • Automatic1111 extension – I used this for the Twin Peaks video above on my paperspace A1111 setup. It got stock from time to time and needed a page reload, but otherwise worked well.
  • Google Colab noteboook – works on a free GPU and I managed to render up to 150 frames with it. It should apparently be possible to render up to 25 seconds of footage on paid GPUs with more VRAM. Although I’ve found that shorter generations between 1-5 seconds tend to be more accurate to the prompt compared to longer ones (more about this below, see NUWA-XL).
  • HuggingFace demo – best used after duplicating the space. A T4 GPU is enough.

Gen-2 with Text-to-Video and Image-to-Video

RunwayML teased that they would announce something new this week and that something was Gen-2 – an improved Gen-1 model that also supports text-to-video and image-to-video capabilities which will “soon” be available on their platform. Most people haven’t gotten around to try out Gen-1 yet, so hopefully access will roll out soon. If output looks as stunning as what they present here, ModelScope will be forgotten as soon as this releases.

“Aerial drone footage of a mountain range” Gen-2 text-to-video example


Here is another text2video model called Text2Video-Zero that relies on existing Stable Diffusion models and harnesses the power of canny edge and pose guidance for temporal consistency. The results are a bit stiff as all examples don’t seem to have a camera panning in them. Then again, the model has a video-to-video method, and that one seems quite good actually!

Text2Video-Zero examples

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Current text-to-video models like ModelScope (and probably Gen-2 as well) initially only support creating short coherent videos without major shot changes. NUWA-XL, a novel Diffusion architecture, aims to solve this. Current datasets contain mostly short videos, thus they tend to fall flat when generating longer videos. NUWA-XL makes it easier to train on long videos (3376 frames for example) which should reduce the gap between the length of footage in the dataset and the length of videos that can be generated from it.

Figure 4: Qualitative comparison between AR over Diffusion and Diffusion over Diffusion for long video generation on FlintstonesHD. The Arabic number in the lower right corner indicates the frame number with yellow standing for keyframes with large intervals and green for small intervals. Compared to AR over Diffusion, NUWA-XL generates long videos with long-term coherence (see the cloth in frame 22 and 1688) and realistic shot change (frame 17-20). Source: arXiv:2303.12346.

Pix2Video: Video Editing using Image Diffusion

And there is yet another video editing method called Pix2Video. If it feels like we’ve seen one of those for almost every week in the past month, that’s because we have (Video-P2P in issue #24 and Fate/Zero in issue #25) 😅

Pix2Video example turning the driving video into a “watercolor painting of a dog running”

Comp3D and Set-the-Scene

We’ve seen Text-to-3D, Image-to-3D and Video-To-3D. What we haven’t seen yet, is a model that allows for better compositional control. Comp3D and Set-the-Scene introduce the ability to input a bounding box rendering (basically a 3D segmentation map) with semantic meaning (by assigning a text prompts to each box) to guide composition when generating a scene.

Comp3D example


Remember Instruct-Pix2Pix (issue #9)? Well, now there is Instruct-NeRF2NeRF, basically a 3D equivalent which enables instruction-based editing of NeRFs (via a 2D diffusion model). It’s funny to read my thoughts on this from not even a half a year ago:

Imagine being able to edit existing images by simply saying “add fireworks to the sky” or “replace mountains with city skyline” instead of constructing a whole paragraph of words.

No need to imagine these things anymore, they’re already here.

Instruct-Nerf2Nerf example

Vox-E: Text-guided Voxel Editing of 3D Objects

But of course, there is another one. Vox-E is able to generate volumetric edits from target text prompts, allowing for significant geometric and appearance changes, while faithfully preserving the input object. The objects can be edited either globally or locally, meaning either adding rollerskates to a kangaroo or converting an entire object into low-poly video game style for example.

Vox-E examples. Wait for it.

Real-time volumetric rendering of dynamic humans

We haven’t heard a lot from Meta AI these past few months, but it seems like they’re working on making their Metaverse a reality. In their latest paper Real Time Humans, they showcase their method for reconstructing and real-time rendering of dynamic humans. Obviously no code, because Meta.

Input video + SMPL fits example

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

Similar to Word-As-Image (issue #24), DS-Fusion is another model that is able to stylize one or more letters of a word to visualize their semantic meaning. Although the output looks a bit more rough around the edges compared to Word-As-Image, DS-Fusion is able to create a wider arrange, including colorized, outputs based on their method.

Ds-Fusion Zombie example

Even more…

There really is a ton of stuff this week and I can’t write a summary about them all. So here is a quick list with some more awesomeness. And this begs a question, which news style do you prefer? The one above where I explain a bit what the paper is about or this one below, shorter but without a preview image and only a short description? Reply to this email.

  • ReBotNet: Fast Real-time Video Enhancement. Looks like an improved and faster method to upscale videos.
  • Zero-1-to-3: Zero-shot One Image to 3D Object. Tried this on myself but I wasn’t happy 😅. HuggingFace.
  • Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. GitHub.
  • RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
  • Text2Tex: Text-driven Texture Synthesis via Diffusion Models
  • Persistent Nature: A Generative Model of Unbounded 3D Worlds
  • Dreambooth3D: Subject-Driven Text-to-3D Generation

Imagination: Interview & Inspiration

I’ve a soft spot for minimalism. So when Puff entered my radar with his beautiful AI explorations, I knew I wanted to have him on the newsletter. Let’s dive in!

[AI Art Weekly] Puff, what’s your background and how did you get into AI art?

I’ve been building personal projects for fun since I first started using a computer. I quickly discovered that building interesting and experimental things was even more enjoyable and got into AI art in 2018 or 2019 when I took a @fastdotai course with @jeremyphoward. I expected to learn about the code behind machine learning but also ended up discovering computational creativity and neural style transfer. My first piece of AI art was a photo of myself put through a neural network that recreated the photo in the style of Vincent Van Gogh’s “Starry Night.” This plus seeing @NeuralBricolage’s AI art made me fall in love with this type of art.

[puff x ai] by Puff

[AI Art Weekly] Do you have a specific project you’re currently working on? What is it?

Right now, I am simply enjoying experimenting with AI art. I haven’t minted any of my work yet as I am still in the exploration phase. However, I do have a small project in the works that is different from anything I have tweeted in the [puff x ai] series. This project will be an audiovisual exploration of the emotions and feelings of an AI. Perhaps it will stay in my personal collection or maybe I will mint it. I will know for sure once it is finished.

[AI Art Weekly] What drives you to create?

A big driver for me is the fact that art allows a person to express things that words can’t. Just a simple act of closing your eyes in silence for a few minutes can prove to a person that some things are just unexplainable in words. Art allows a person to further express these ideas and thoughts.

[puff x ai] by Puff

[AI Art Weekly] What does your workflow look like?

I often meditate and take long intentional walks, and sometimes inspiration comes from those activities. Other times, inspiration comes from my attempts to convey a subtle message through my art, much like a Zen koan can carry deeper meaning than what initially appears.

The tools I use include Midjourney, Python, Stable Diffusion, and occasionally the C language.

[AI Art Weekly] Do you have a favourite Zen koan?

When the many are reduced to one, to what is the one reduced?

[AI Art Weekly] What is your favourite prompt when creating art?

I go through stages because the current tools allow for quick iteration and evolution, but at the moment, the words I like to use most in my prompts are minimalism, serenity, and realism.

[puff x ai] by Puff

[AI Art Weekly] How do you imagine AI (art) will be impacting society in the near future?

AI’s like GPT-4 and other LLMs will continue to free up our time as humans, allowing us to spend more time being creative. Through this process, we will see the emergence of never-before-seen masterpieces from people who were previously too busy to create. I wouldn’t be surprised if the Mona Lisa of our time is a digital, AI-assisted work created by an ex-entry level accountant.

[AI Art Weekly] Who is your favourite artist?

My favorite artists are Deafbeef, Pindar van Arman, DieWithTheMostLikes, Claire Silver, XCopy, and GirlWhoShivers.

A cloudpainter painting by Pinder van Arman. Pinder builds robots with neural networks that physically paint onto a canvas. Super interesting. Future interviewee?

[AI Art Weekly] Anything else you would like to share?

Everyone should attempt to make art. We were made to create and you could be the next Beeple.

Creation: Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

A blend of different images and We were made to create, be it paintings, music, food, movies or life itself, life is our canvas:: minimalism, serenity, abstract high quality illustration --ar 3:2 --iw 0.5 created with Midjourney V5 by me with init images from @SMOjcsnaps and @moelucio

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa