AI Art Weekly #29
Hello there my fellow dreamers and welcome to issue #29 of AI Art Weekly! 👋
We’re closing in on 1500 subscribers. Would be cool if we could reach that goal for issue #30. If you know someone who might be interested in this newsletter, please forward it to them. And as always, ideas on how to improve the newsletter are always welcome. Just respond to this email.
I have been building some helper tools with GPT-4 this week to aggregate AI news, images and videos that end up within the newsletter. I must say that the productivity boost from it feels great. Hopefully this will make it more time-efficient to put these condensed issues together starting next week. Which means more time for me to work on adding new quality of life features to the website, which have been long overdue. But for now, let’s focus on what happened this week:
- Stable Diffusion XL Beta available
- ControlNet 1.1 released
- Generative Agents 🦾
- Interview with AI artist and writer Rebel Without Applause
- SceneDreamer code released – generate unbounded 3D scenes
- AudioLDM Google Colab – generate audio from text
Cover Challenge 🎨
The next theme is imagine. So anything goes, just submit whatever you like! The reward is another $50. Rulebook can be found here and images can be submitted here. Come join our Discord to receive feedback and talk future challenges. I’m looking forward to all of your artwork 🙏
Reflection: News & Gems
Stable Diffusion SDXL Preview
Stability AI released the SDXL (Stable Diffusion XL) Beta model this week. It’s currently available within DreamStudio and their API. The model weights aren’t open-sourced yet unfortunately, but they announced that this will happen after some beta testing.
Some of the highlights of SDXL’s capabilities include:
- Next-level photorealism capabilities
- Enhanced image composition and face generation
- Rich visuals and jaw-dropping aesthetics
- Use of shorter prompts to create descriptive imagery
- Greater capability to produce legible text
Inpainting and non-square aspect ratios aren’t available yet. Although first results don’t look that promising compared to Midjourney, I’m mostly excited about what the community will make out of this.
ControlNet 1.1 landed this week. All in all it includes updated and improved models, but there are some new noteworthy additions.
First of all the Openpose model is now able to work with hands and faces. No need for cumbersome workarounds to place hands.
Second of all there are new guidance methods: line art, shuffle, pix2pix, inpaint and tile. Personally I’m most excited to try tile guidance. It splits up an image into a tiles. For a given tile, it recognizes what is inside the tile and increase the influence of that recognized semantics, and it also decreases the influence of global prompts if contents do not match.
Generative Agents: Interactive Simulacra of Human Behaviour
Autonomous AI agents like AutoGPT and babyagi are the latest craze in AI world. The basic idea is to provide an LLM with a goal-oriented prompt, provide an environment with access to a web browser or local files, combine it with memory and then let it come up with tasks to reach its goals through a self-referential loop. Sounds similar to what we as humans do? Yes, that’s why some people in theory think AGI isn’t that far away anymore. In practice? Well, I told it to write this newsletter and it failed miserably. So, still crunch-time for me 😅
Nonetheless, given the rapid pace of our advancement, what holds true today may not hold true next week. And writing a newsletter isn’t the only application for agents. Enter Generative Agents – computational software agents that simulate believable human behaviour.
Those generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. Thankfully not in a western theme park for now, but in an interactive sandbox environment inspired by The Sims – a small 2D town that consists of twenty five of those agents. Because the whole thing is interactive, end users are able to prompt and influence the behaviour of agents and can observe how things will play out.
The code hasn’t been open-sourced, so you can’t play around this with yourself, but you can observe a pre-recorded world here.
Apart from the aparant Westworld implications, I can see this implemented in video games as a way to create fully automated believable NPCs. Crazy times.
Rich-Text-To-Image: Expressive Text-to-Image Generation with Rich Text
I love to see explorations on how we can prompt differently compared to just using plain words. Rich-Text-To-Image introduces the ability to use format information from rich text like font size, color, style, and footnote for text-to-image generation. Looks like fun and you can try out the demo on HuggingFace.
vid2vid-zero for Zero-Shot Video Editing
vid2vid-zero is yet another video-to-video model. Compared to other methods, this one doesn’t require you to train or fine-tune a model before you can use it. The code just got released this week and you can try it out on HuggingFace.
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
What if you could generate images from an untrained concept by providing a few images and without having to fine-tune a model first? InstantBooth from Adobe might be the answer. The novel approach is built upon pre-trained text-to-image models that enables instant text-guided image personalization without finetuning. Compared to methods like DreamBooth and Textual-Inversion, InstantBooth model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster. Wen open-source?
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
DreamPose is an image-to-video model that can take an input image of a person and pose sequence, and generate a photorealistic video of the input person following the pose sequence. The consistency of the output looks good, so this might be useful to animate characters.
More interesting research & gems
- VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
- Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
- How you feelin’?: Learning Emotions and Mental States in Movie Scenes
- Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
- Mesh2Tex: Generating Mesh Textures from Image Queries
- DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
@CoffeeVectors added 2D sidescrolling mechanics and collision detection to a realtime NeRF in @UnrealEngine 5 with the @LumaLabsAI plugin and shared the tutorials to make this happen.
@mrjonfinger made another film using @runwayml and @elevenlabs. The script was inspired by a Tweet that GPT-4 wrote: “I just had a dream that I was an AI and woke up in a lab. What does it mean?”
@SuperXStudios added the ability to generate custom 3D character skins for his game “Fields of Battle 2”. The image is generated using ControlNet OpenPose, which creates character textures and is then pulled through a pipeline to create a 3D, rigged, animated character in about 15 seconds. Amazing!
@NathanBoey shared how he created his animated piece “LUNA”. Definitely worth a read.
Imagination: Interview & Inspiration
In this weeks issue of AI Art Weekly we talk with Rebel Without Applause aka @TymothyLongoria. I’m following Tymothy’s path since he first participated in the “cyberpunk” challenge and loved to see how his style evolved over the last few months. So I thought it’s time to ask him a few questions. Let’s jump in.
What’s your background and how did you get into AI art?
I have been a writer for nearly 20 years, and I have been writing seriously for about 14 of those years. This journey led me to opportunities such as writing for the now-defunct online macabre children’s magazine, Underneath The Juniper Tree, and acquiring an agent through the traditional query process. However, I left the community and took a break after my agent exited the business. I returned in late December 2021 and stumbled upon an image created with AI by the brilliant artist Nekro. I felt compelled to try it for myself, and the creative spark was reignited. And so, here we are, taking another step in my journey.
Do you have a specific project you’re currently working on? What is it?
Through experimentation and my passion for lore and worldbuilding, I have created a character named Stelle. Although she is more than just a project, I consider her to be a character within my enigmatic wilderness. At the moment, Stelle and the aforementioned wilderness are at the forefront of my creative endeavors.
Any tips for aspiring lore / worldbuilders?
You own your perspective, and only you can tell a story like you can. Your experiences, your views, the color you see the world in is yours, and that’s how you tell your story. Pay close attention to detail - people notice.
What drives you to create?
The inherent need to share of my thoughts, my divergent thinking.
The desire to write.
What does your workflow look like?
Concepts and ideas come naturally to me, for better or worse and I primarily use Midjourney, photomanips and Photoshop/After Effects.
What is your favourite prompt when creating art?
My favorite ways to prompt are to use phrases, that is, instead of using
a man with a ten-gallon hat etc, I’ll write:
a devious and dapper entity with macabre and ominous notions
I use that and a myriad of similar dramatic prompts as foundations.
How do you imagine AI (art) will be impacting society in the near future?
We’re already seeing people with no artistic background finding their way in and around and through the creative world. Those who before could only imagine, can now /imagine. And those who hone their skills, can only become better. Just as with the creation of art, non-writers can now become writers and poets with the help of ChatGPT. This hits me personally, and reminds me to empathize with traditional artists who for whatever reasons are against art creation with AI. I say this sincerely: it won’t impact me negatively, and I can honestly say I might give it a try to help lay the foundations for future stories and world-building.
Who is your favourite artist?
I didn’t have a favorite band until a few years ago, so “favorite” isn’t a term I often use.
However, one artist that come to mind are 0009, which has a brilliant aesthetic blend of color and rebellion. Other notable names are OTHERfaces_Ai, Aempatia, Trez Art, Doc T, Manu, and BLΛC.ai, to name just a few.
Bellini and Caravaggio also come to mind, as I tend to favor a darker, more ominous palette.
You said you didn’t have a favourite band until “now” – which band is that?
Absolutely, and without a doubt - The Contortionist.
Anything else you would like to share?
Lastly let me say thank you for this opportunity to speak on a few things about my journey, and to leave the reader with this:
Find joy in what you do - in your art, your process, your style, and your expression. If you can, this will never get old.
Creation: Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
@reach_vb put together a Google Colab notebook that lets you generate audio from text using the AudioLDM model.
Last weeks Follow Your Pose paper brought pose guidance to video, but creating pose frames isn’t that straightforward. So @fffiloni put together this helpful HuggingFace space that lets you convert a video or gif to a MMPose sequence.
The SceneDreamer code from issue 19 got released and lets you generate unbounded 3D scenes. There is also a HuggingFace demo.
I already shared a HuggingFace demo which lets you generate an infinite zoom video, this one is usable directly within Automatic1111.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!