AI Art Weekly #2
Welcome to the second issue of AI Art Weekly. A newsletter by me (dreamingtulpa), to cover some of the latest happenings in the AI Art world.
Each issue will contain three main sections:
- Reflection : We take a look at what’s new in the AI art space. What are the latest updates and what new features got released this week.
- Imagination : AI art wouldn’t be much without imagination. In this section we share prompts for you to try out and highlight people from the AI art community.
- Creation : After reflection and imagination comes creation. In this section we share tools and tutorials to help you turn your dreams into art!
Without further ado, let’s get into it.
Things are moving incredibly fast right now. It seems like every week we’re gifted with new groundbreaking research papers and first implementations to tinker around with.
For instance the implementation of Google’s Dreambooth aka Textual Inversion 2.0 with Stable Diffusion has taken up steam this week. Dreambooth Stable Diffusion enables users to train SD with new images, for instance your own face, and then use that to dream up new creations.
The Corridor Crew released a video were they showcase Dreambooth Stable Diffusion on a custom made system.
A few days ago the drawback was that you need a system with at least 24GB of VRAM to run Dreambooth SD, but as fast as things are moving in AI world, that no longer holds true. Reddit user 0x00groot managed to get it running on 12.5GB VRAM, making Dreambooth SD available on a wider array of consumer Nvidia GPUs and even on the free Google Colab tier. Happy training!
Speaking of Google Colab, they now added the ability to purchase additional compute units, which also gives us a glance behind the curtain on how many A100 and V100 hours you actually get when buying a Pro/Pro+ subscription. My tests showed the following results:
- A100: ~15/units/hour = 33.3 hours with Pro+
- V100: ~7.5/units/hour = 66.6 hours with Pro+
If you now compare that to the actual speed you get when using Stable Diffusion, you would want to stick to a V100, as A100’s are only around 25-50% faster when compared. There has been a bit of backlash within the community, but on the bright side this might lead to better support for local runtimes. Who knows, maybe we’ll be able to run SD on our phones in a few months.
In other news, DALL·E has removed their waitlist this Wednesday. You can now signup and start creating directly without having to wait in line.
At the time of writing, there was no new announcement by MidJourney this week, although there was status update in their discord, reading: “ BOT DEPLOY: Few adjustments, to be announced today ”. David Holz mentioned in their weekly office hours that they’re working hard on new models, new speeds, new resolutions, bigger batches/grids and bringing back multiprompts and image references. Super excited to see what their engineering team will come up with.
And last but not least, there was a release of three independent research papers. Meta announced a text2video system called Make-A-Video with examples that look like nothing we’ve seen before. It’s currently in trial phase so we don’t know what its real capabilities are, but it looks promising (aside from the annoying watermarks). Another text2video model called Phenaki, which also looks incredible. And if that wasn’t enough, a text23d model called DreamFusion, which looks similar but more advanced than Google’s DreamField. The new stuff has no public available tools for you to play around yet, but there is a Colab for the older DreamField model if you want to give it a spin (check the Creation section at the bottom).
What a week!
In this weeks interview of AI art weekly we’ll talk to CoffeeVectors. An AI artist I appreciate for his thoughtful and inspirational tweets, posing questions that invite you to think instead of just consume. This is also reflected in his answers, which I deliberately didn’t want to shorten. I would rather hear from you what you think about the length of the interview. Should we keep it shorter like last week or do you like the more in depth style of this interview? Let me know on Twitter.
[AI Art Weekly] CoffeeVectors, who is your favourite artist?
I don’t have a favourite artist when it comes to aesthetics. I draw inspiration from hundreds of sources across all kinds of mediums from novels, to renaissance paintings, fashion photography, anime and video games. Also I’m more drawn to individual pieces of an artist rather than entire bodies of work. I’m in love with the paintings of Tamara de Lempicka for example, but I’m not inspired by every piece.
But I do have a favourite artist in terms of an overall perspective and that’s Leonardo Da Vinci. After reading Walter Isaacson’s giant biography on him, I was deeply inspired by something counter-intuitive—Da Vinci practically gave up painting towards the end of his life. The art seemed to have become incapable of sating his deep curiosity, perhaps ironically because of his success. So it faded in importance for him. And I’m strangely drawn to that because I think it shows that his art was in service to him rather than he a slave to it. You don’t have to define yourself by only the things you’re good at; by only what people know you for.
Also Da Vinci was someone who did not see art as a separate thing from engineering and science, but it was all part of a continuum of beauty and mystery. That’s how I tend to view things. I see art as an extension of communication, no different than coding, consoling a friend in difficult times, a birthday party, mathematics, comedy, or trying to organize and lead a team. There’s obviously major differences between all these things, and that’s usually what we see, but if you’re someone that can find the unity in all of it, the hidden bridges and tunnels that connect what is obviously different on the surface, I think that’s something really special. I love how Da Vinci seemed equally concerned about being seen as an engineer as much as an artist. And he was also a set designer, an event planner, all manner of things. It’s like he was trying to explore an underlying structure that was unaffected by categories and semantics. There’s something magical to me about people who can see things that way. They can create art outside of art.
[AI Art Weekly] What’s your workflow look like?
I come from a photography/video background so my approach is more about considering selections from a set of options. I’m usually running a local instance of Stable Diffusion. I like that because I have more control and since I have 24GB of VRAM, I can generate sets and “search” the latent space relatively fast (I can generate something like four images in 10 seconds with my initial settings). But if I’m not getting results I like, I’ll head over to MidJourney and Dall-E 2.
Starting in a local instance of Stable Diffusion I’ll come up with some medium sized prompts. Usually a specific concept followed by some fairly generic modifiers like
photo etc. From there I’ll improvise playing with shorter/longer prompts and see what’s getting me closer to something interesting.
Once I have something that’s like 80% there, I start looking at “nearby” images in the latent space by walking through the different variables that are available to me in Stable Diffusion. Using something called an X/Y Plot script, I can generate a grid where the axis correspond to different values of whatever variable I want to walk through. So X could be steps and Y could be different sampler methods. I’ll do different X/Y Plots and see if there’s a different configuration of settings that gets my prompt closer to something more cohesive, with less artifacts, or that’s more interesting.
I think one of the good things about taking a searching approach like this is that it helps alleviate some of the FOMO around wondering if you’ve really generated the best image. I know with MidJourney we sometimes can get into what a few AI Artists call “ doom prompting ” where we’re just endlessly trying tiny variations with the vague hope that one more round of image synthesis will get us that EXACT image we want. I don’t know about you but that makes me feel like a mouse in a lab experiment lol.
Once I have something I’m happy with, I think of it like a RAW image file or clip. I’ll usually bring that into Photoshop or another program and do additional work on it depending on what I’m going for. Could be generating depth maps for an animation, simple adjustments to fix remaining artifacts, or just upscaling and sharpening. Right now I’m mostly interested in seeing how AI tools integrate with other digital tools, but occasionally I’ll post something RAW right out of the AI.
[AI Art Weekly] How do you think AI Art tools will evolve in the future? What possibilities can you image?
There’s a short-term and a long-term answer.
In the short-term, say the next year or so, I think where we’re going is using AI to create a more direct relationship between 2D and 3D pipelines. That is, use AI to synthesize an image with prompts and then have that easily transfer into a 3-dimensional representation, either with some kind of AI that can generate meshes and project the 2D image onto textures (filling in missing information with inpainting), or it will generate some kind of NeRF (neural radiance field). Obviously we’d need to design these systems so that artists can build up, refine, and art direct the results as needed. The purpose would be getting to a starting point without having to build EVERYTHING from scratch while also having the controls to make deep changes to that automated starting point. If we can do that we open up an entire field of new process for animation, for being able to art direct 2D images more precisely and coherently, and dramatically speed up existing 3D workflows.
“AI + 3D. Scene made in #midjourney Characters from @daz3d #daz Post in PS and animation and rack focus in AE using a depth map made in a colab.”
For the long-term, I think we’re looking at developing the next generation of entirely new mediums, or variations on existing mediums, that are AI-first and that specifically require artists familiar with AI tools to create the content. It’s hard to say what these mediums might look like as they’ll have to emerge from how the community grows and what it discovers over the next several years, not to mention how the markets react and intersect with these new forms. They could revolve around complex world building on a scale that would take a human-only trad-art team too long to create. It could be something extremely dynamic that changes at a frequency, speed, and scale that would need a persistent creative intelligence to generate (so more than what procedural generation without an AI can do). For example a narrative-based video game that writes new dialog or character interactions as you go, or that even generates new game mechanics depending on player choices.
Editorial note: Checkout Character.AI in the Creation section below, I feel that’s a good example on how character dialogue could potentially be generated in games utilizing AI.
We could also end up with something like a Holodeck-style interface from Star Trek connected to a VR/AR experience where you’re able to create and control the environment directly through speech, without the need for detailed knowledge of computer graphics.
The holodeck is the ultimate evolution of virtual environment creation; large rooms capable of re-creating vistas, landscapes and environments for the purposes of training and recreation and even holographic characters with whom the user can interact and role play with.
[AI Art Weekly] Anything else you want to share?
If you ever get the chance to search the latent space for images “around” a prompt you like, you can see how small changes in variables can get you a totally different image. What this means is, you can have the “right” prompt, but because the other settings aren’t in place you’re not getting stuff you like. Put another way, a prompt is still latent space, just a smaller version. To explore that space requires going into the other controls available to you.
Or you can think of it in the opposite way—if you have the “wrong” prompt, but other settings are configured a certain way, you might end up with what you were originally going for. Sometimes the path to an image is straight. Sometimes the only path is to become lost. In truth both sets of paths probably exist at the same time, it just depends on chance which ones you happen to start closer to.
While I can imagine that knowing this might be a bit disheartening for some people, I think it counter-intuitively brings AI Art towards a more analog space. There’s a kind of chaos (or extremely complex, almost fractal order) present in the system. Depending on the configuration of tools you’re using, you can make a factory (which can have value in a lot of situations), or you can make a dynamic, unique experience of creation where you’re in relationship with a machine and the library of human creation. Or it can be a mix of both in different degrees. You’ll have to search to find what works best for you.
Each week we share a style that produces some cool results when used in your prompts. This weeks featured style is
ink Dropped in water, splatter drippings.
These are some of the most interesting tools I’ve come across this week.
Point this notebook at a youtube url and it’ll make a music video for you. You don’t need a DreamStudio API key for this, just disable that setting and it’ll install the Diffusers right within the Colab.
Christian Cantrell is developing a free Stable Diffusion plugin for Photoshop which lets you generate new images with text2img, img2img and now even inpainting by using layer masks.
A browser interface based on Gradio library for Stable Diffusion with tons of features. This is the one CoffeeVectors is using to create X/Y plots.
I feel this is a good example on what dynamic character dialogue could look like in future video-games. If you sign up, chat to the character Grok I created. It’s a greek yoghurt that wants to take over the world. Thank you @darkestdollx for sharing.
A toolkit to generate 3D mesh model / video / nerf instance / multiview images of colourful 3D objects by text and image prompts input.
And that my friends, concludes the second AI Art weekly newsletter. Please consider subscribing and sharing if you liked it and let me know on Twitter if you have any feedback. The more people get to see this, the longer I can keep this up and the more resources I can put into it.
Thanks for reading and see you next week!