AI Art Weekly #18
How’s it going my fellow dreamers and welcome to issue #18 of AI Art Weekly 👋
The newsletter is still taking a lot of time and effort to put together, so I’m thinking about setting up a Discord community for people who would be willing to help out and share useful resources and articles, cutting edge papers or just other cool AI art related things they’ve found during the week. Reply to this email if that sounds interesting to you.
Highlights of the week are:
- StyleGAN-T, will GANs soon catchup with Diffusion models?
- Msanii, a novel diffusion model to generate long and high-quality music samples
- Interview with AI artist Daryl Anselmo (which we also turned into an AI generated podcast)
- LoRA DreamBooth Training UI for faster Dreambooth training
- Stable.Art Photoshop plugin
Cover Challenge 🎨
The challenge for this weeks cover was “dualism” and we received 71 submissions from 45 artists. As usual, the community decided on the final winner. Congratulations to @ImGlassCrown for creating this weeks beautiful cover art 🥳. And as always a big thank you to everyone who contributed!
The theme for the next challenge is “patterns”. Think geometric, organic, repeating, random, and fractal patterns. Prize is again $50. Rulebook can be found here and images can be submitted here.
I’m looking forward to all of your submissions 🙏
If you want to support the newsletter, this weeks cover is available for collection on objkt as a limited edition of 10 for 3ꜩ a piece. Thank you for your support 🙏😘
Reflection: News & Gems
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Before diffusion models took over the world by storm, GANs (generative adversarial networks) were the state of the art for text-to-image synthesis. StyleGAN-T aims to identify the necessary steps to regain competitiveness. There is no code available yet, but the interpolation video on the demo page shows promising results. The paper also claims to be able to create 56 sample images at a resolution of 512x512 in only 6 seconds with an RTX 3090. Can’t wait to give this a try.
Msanii: High Fidelity Music Synthesis on a Shoestring Budget
Msanii is a novel model that can efficiently create high-quality, long-length music by combining mel spectrograms, diffusion models, and neural vocoders. The model created by independent research @mkinyugo is able to produce 190 seconds of stereo music at a high sample rate of 44.1 kHz and is the first successful use of a diffusion-based model for this purpose. We’ve seen a few audio generators before and this one could potentially be used to create long and high-quality music samples for various applications, such as music production, film scoring, and video game soundtracks. The demo model was trained on piano tracks, which I personally don’t have any use for, but as far as I understand it, it’s be possible to train the model on different audio files.
RecolorNeRF: Layer Decomposed Radiance Field for Efficient Color Editing of 3D Scenes
So far we’ve seen various NeRF applications: Generating 3D scenes or objects, manipulating the camera of videos in real time or transfering styles from images onto scenes. This week we got RecolorNeRF. The method introduces palette-based editing which makes it possible to efficiently change the colors of existing neural radiance fields.
InfiniCity: Infinite-Scale City Synthesis
This feels like a weird one, but I have a knack for procedural generation and minimalist city posters and the InfiniCity framework could potentially be used for both. As the name suggests, it’s a framework that aims to synthesize infinite-scale city scenes.
You might have heard about the lawsuit against Stability AI, DeviantArt and Midjourney. A few tech enthusiasts have put together a well referenced response to the original lawsuit announcement. The Corridor Crew also put together a worthwhile video about the case explained by an actual lawyer.
It took me a while, but I finally managed to produce another AI music video. This time rendering multiple versions using Stable Diffusion v2 and then combining them into a single final product. Even though SDv2 is harder to prompt for due to the reduced dataset, its ability to stay more coherent can be an advantange in certain situations (haven’t seen a single horizontal split when zooming out so far).
I love generative art but never took the time to sit down and tinker with it. Well, this now changed thanks to @OakOrobic, who came up with the idea to instruct ChatGPT to generate p5js code. Tried this myself recently and it’s quite fun.
Imagination: Interview & Inspiration
In this week’s issue of AI Art Weekly, we talk to Daryl Anselmo, an artist who has roots in the video game industry and has worked on titles that I fondly remember from my childhood. Daryl was one of the first artists who inspired and helped me with his daily AI art sets last summer when I started posting to Instagram. So, I’m more than happy to finally interview him for the newsletter. Enjoy!
For this weeks interview we also did a fun experiment by turning the whole thing into a podcast episode. I used ChatGPT to convert the interview below into a conversation and then used elevenlabs.io to clone our voices and generate speech with their state of the art Text-To-Speech model.
[AI Art Weekly] What’s your background and how did you get into AI art?
I come from a background in the video game industry. I began my career as a 3D environment artist at Electronic Arts in the mid 90’s as the industry was making its transition from 2D to 3D. Back then, the teams were small, artists had to be more technical and I also had the opportunity to dabble in many facets of game production from characters, animation, environments, UI, VFX and motion capture. One of my specialties was helping to write tools and scripts that allowed smaller teams of artists to handle high amounts of content production - and I have great nostalgia for those days.
While working at EA, I formed a tight, creative partnership with Josh Holmes, and together we co-created NBA Street, Def Jam Vendetta and Def Jam Fight for NY. We eventually left EA to form our own studio in Vancouver (Propaganda Games) which we later sold to Disney where I worked for many years. I then moved to the Bay Area and served as the Art and Creative Director for FarmVille 2 at Zynga, then rekindled the partnership with Josh and co-founded Midwinter Entertainment in Seattle. I left Midwinter in late 2021 to both relocate back to the Bay Area and spend more time with family in Vancouver. Lately, I have been deepening my art practice and coding skills in anticipation of future projects.
[AI Art Weekly] Do you have a specific project you’re currently working on? What is it?
I’ve always been fascinated with the relationship between art and technology. Last year, I wrote a generative algorithm in Processing (currently unpublished) and was introduced to Dall-E 2, Midjourney and Stable Diffusion, all of which immediately captivated me and did what I was setting out to do but much better than I could. I really haven’t been this excited about technology for image generation since perhaps real-time 3D graphics came onto the scene ~30 years ago.
Last year, I started releasing daily image “sets” on whatever inspired me at that moment. I’ve been studying mostly on surreal landscapes, maximalist architecture and interior design. I’m striving to keep it really pure and experimental (no pandering) which can be a challenge as I find the pull of influence and electronic currents from social media quite strong and corrupting.
[AI Art Weekly] What does your workflow look like?
Right now, I’m mostly in MidJourney / Topaz. I use Lightroom for color grading and post-processing, with bashing and cleanup in Photoshop. I provide my full prompts in the final image of every daily set, and I’ve been using illustrator for graphic design. Lately I’ve been bringing more output into DALL-E 2 for outpainting and canvas extension since MJ V4 can be kind of “croppy.” I’ve written a handful of scripts and tools in the background to automate file management, versioning and image production, and chatGPT to give me a launching off-point for the narratives and lately, some poetry.
I start with a loose idea for something in MidJourney and usually begin with a simple prompt (under five words) then iterate by adding more words and tweaking until I start to see consistent output.
Most of the time, I find the randomness factor with small prompts (under 5 words) is pretty high, which is great for ‘conducting a search’ for a look. Output tends to get more and more consistent and specific as I add more terms, but sometimes it all falls apart so I have to distill the prompt down to something more essential and rebuild it back up. Sometimes MJ will spit out a set that vibes immediately, but sometimes it can take hours (or even days) because I can’t quite find the perfect words or the results aren’t vibing. Collaborating with AI is basically an analog for Traditional Art Direction - the process is eerily similar to commissioning a human artist - the more effort I put into the creative brief, typically the better the results - but, it can all get deep-fried by putting in too much confusing jargon. For me, clarity is key.
[AI Art Weekly] What is your favourite prompt when creating art?
When it comes to prompting. I have a few “Midjourney Old Reliables” like:
Sense of Grandeur or
Sense of Awe.
I find those specific terms don’t work the same in Stable Diffusion, the notation is quite different between the various models and I find this almost like learning a new programming language.
The use of color is important. Most often I specify colors directly in the prompt, or use negative prompts to reduce or eliminate colors that I don’t want in the output. Palette reduction is a powerful tool in cinema for storytelling and character development!
[AI Art Weekly] How do you imagine AI (art) will be impacting society in the near future?
It’s a wild time right now, so it’s hard to say for certain how AI will shape our culture. It’s clear that we are in the midst of a paradigm shift, so perhaps we can draw parallels from prior shifts to provide some clues.
I’ve been mentoring other designers and lecturing on the topic and I try to remind people that artists and designers have always had a challenging relationship with technology; whether that was the transition from 2D to 3D graphics that I experienced early in my own career, or prior to that, the introduction and adoption of CGI in filmmaking. If we keep stepping back there are many other examples - drum machines, the motion picture camera, the recording industry, photography, camera obscura, rare pigments of tubed paint. All of which took time to adopt and achieve acceptance, opening new doors for human expression. I also remind people that it may take an entire generation of people who “grow up” with a certain technology for it to normalize.
I consider myself an early adopter and techno-progressive, I am an optimist but maybe more of a centrist. AI is a tool, and without question, a powerful one. I believe the image models will continue to improve and get faster and higher fidelity. Today, it takes a few seconds to generate a single image (of reasonable quality) and very soon that will be a few images per second, then 60 frames per second. Right now, we enter prompts as text, but this may transition over to a voice interface and interacting with computational intelligence could be more like having a sentient, real-time conversation (in multiplayer!).
Soon we are walking around in bespoke 3D worlds that we have summoned like wizards, which may imply wholesale disruption of storytelling, experience design and industry. We may find we have less bandwidth for consuming content made by others, as we choose to spend more time creating and consuming our own. A 3.5 hour movie without intermission (no matter how great it is) seems oppressive to me. I would rather spend my time creating.**
It may sound lonely, but it doesn’t have to be. Video games are a parallel. People sometimes want a solo experience, but we are social and tribal creatures that need each other and will naturally gravitate to each other. And we need to share our experiences with each other to make sense of the world around us. Perhaps AI will fill some of that void as we spend more time with bots. Perhaps less of a paradigm shift and more of an industrial or cultural revolution? Hard to say, but I might say if you are exhausted now, buckle up because I reckon things are going to get even crazier.
[AI Art Weekly] Who is your favourite artist?
So many come to mind! I am heavily influenced by the design scene - a big fan of Aaron Draplin. Andy Gilmore and Dmitri Cherniak also come to mind especially as I set out to write my own generative algorithms. In the AI space, I really like what Roope Rainisto and Claire Silver are doing and I always look forward to their work!
This weeks Style of The Week is a Stable Diffusion exclusive from artist @DGSpitzer who doesn’t fear, but fully embraces AI art by fine-tuning a model on his own paintings and concept artworks and giving it to the community for free. Truly appreciated 🙏
Creation: Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
I know I’m a bit late to the game, but I’ve recently started tinkering with fine-tuning my own models. While Dreambooth is great, it’s fairly slow. LoRA aims to improve that and there is now HuggingFace space that you can duplicate to run your own trainings.
Stable.art is an open-source plugin for Photoshop (v23.3.0+) that allows you to use Stable Diffusion (with Automatic1111 as a backend) to accelerate your art workflow. As an Affinity user, I’m jealous and although I don’t want to switch, I might soon get a Photoshop subscription.
@ShuaiYang1991 created a HuggingFace space for his VToonify implementation which lets you turn portrait images and videos into toonified versions by applying comic, cartoon, illustration and other styles.
Aside Automatic1111 and InvokeAI, there is yet another Stable Diffusion UI called NMKD. I stumbled upon it this week due to its implementation of InstructPix2Pix which lets you edit images with pure text prompts (I wrote about it in issue #9). Unfortunately it’s Windows only, but thanks to @Gradio, there is also HuggingFace demo.
stable-karlo combines the Karlo image generation model with the Stable Diffusion v2 upscaler in a neat little app. There is also a Google Colab notebook for us stone age AI artists without a decent GPU.
@DavidmComfort put together a guide on how to systematically changing facial expressions, while maintaining a consistent character in Midjourney.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Leaving a Review on Product Hunt
- Using one of our affiliate links at https://aiartweekly.com/support
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!