AI Art Weekly #32
Hello there my fellow dreamers and welcome to issue #32 of AI Art Weekly! 👋
It’s funny. Although this week felt extremely slow compared to previous weeks, there is still enough content for me to consider this issue packed. And now that the format of the newsletter has slowly settled a bit, I would love to hear your thoughts on it. Do you miss something? Do you skip specific sections consistently? I would love to hear your feedback as it helps me improve. That aside, here are this weeks highlights:
- Midjourney v5.1 released and some interesting future insights
- New StabilityAI DeepFloyd IF model released
- Bark: An open-source text-to-audio model
- Interview with artist Đy Mokomi
- Google employees think open-source AI will outcompete them and OpenAI
Cover Challenge 🎨
Spring is in full swing. So this week we’ll be exploring the theme of blooming. This isn’t limited to flowers btw, so feel free to get creative! The reward is another $50. Rulebook can be found here and images can be submitted here. Come join our Discord to talk challenges. I’m looking forward to all of your artwork 🙏
Reflection: News & Gems
Midjourney v5.1 released and future insights
Midjourney released a minor upgrade to their v5 model this week. The new version:
- has higher coherence
- produces more accurate images based on the given text prompts
- produces fewer unwanted borders or text artifacts
- has improved sharpness
- is more opinionated (like v4) and apparently is easier to use with short prompts
- has a ‘unopinionated’ mode (similar to v5.0 default) called “RAW Mode” which you can activate with
--style raw
You can use the new model by activating it with the /settings
command or by simply adding --v 5.1
to the end of your prompts.
Aside from that, David talked about v6 in MJ’s office hours this week, which apparently will have higher resolution and better language comprehension. He also mentioned Midjourney 3D, new interfaces for painting and more which they plan to release all until the end of 2023. Video will probably come in 2024 as the higher resolution they are aiming for seems to be a bit of a pain for video generation. But even with only the former, things will definitely get wilder as they already are.
DeepFloyd IF released
The Stability AI research team, DeepFloyd, released a new text-to-image model called IF. The model is able to produce images with a high degree of photorealism and language understanding and also shines at applying style transfers and upscaling images. The current version is released under a research license, which means images can not be used commercially, which will apparently change once the model goes full open-source.
The weights can be found on HuggingFace. There is also a demo and code can be found on GitHub.
Bark: An open-source text-to-audio model
There is surprisingly little talk about Bark, an open-source the text-to-audio model that can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. Some examples have a robotic echo to it, but some of the more longform examples are quite impressive. The best thing about it, it’s open-source, so you can run it on your own hardware. The only downside right now is, that it doesn’t support voice cloning.
Shap-E: Generating Conditional 3D Implicit Functions
OpenAI quietly released a new 3D generative model called Shap-E. Compared to their Point-E model, this one converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. The code and more examples can be found on GitHub. There is also a HuggingFace demo to play around with.
Perfusion: Key-Locked Rank One Editing for Text-to-Image Personalization
100kb models? Combining muliple individually learned concepts? 1-shot Personalization? Key-Locking? Perfusion just might be a new viable Stable Diffusion fine-tuning method by NVIDIA. No way to try it out yet, as there is as usual no code, but I’m keeping an eye on this one.
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
This week we got GeneFace++. Similar to ANGIE and SadTalker, this paper proposes a new method for generating talking and singing person portraits with arbitrary speech audio. Compared to other methods, GeneFace++ is able to handle silent segments, long-duration words and fastly speaking audio. Unfortunately there is no open-source code, or at least I couldn’t find it.
More papers and gems
- Live 3D Portrait: Real-Time Radiance Fields for Single-Image Portrait View Synthesis
- SAGE: Semantic-aware Generation of Multi-view Portrait Drawings
- NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
Reddit user u/Mobile-Traffic2976 converted an old telephone switchboard into a Stable Diffusion Photobooth. A cool project that brings back some retro cyberpunk vibes.
AI movies are taking up steam. Gen-2 is producing some quality footage with some unique weirdness that we’ll probably lose in the future. So, here are 10 of my current favourite examples consisting of shorts, music videos and concept trailers that I found impressive and worth watching!
This week @SnoopDogg put into words what we’re probably all thinking: “Like, what the f**k?” 😂
Imagination: Interview & Inspiration
This weeks spotlight artist showed up a few weeks ago in my Twitter feed and I’ve loved seeing his artwork ever since. Đy Mokomi is a pseudonymous artist that has engaged in a variety of creative pursuits, including design, visual effects, matte painting, concept art, industrial design, and most recently, generative and AI-infused art. I’m happy he agreed to an AI Art Weekly interview. So, let’s jump in!
What’s your background and how did you get into AI art?
My background is in visual effects. I have been working in the film industry since 2000, starting in the software and research side of the business at Alias|Wavefront. Before I left the industry, I worked as a visual effects supervisor for several years. My exit was prompted by the onset of artificial intelligence and my general interest in how people think. I studied neuroscience for several years before joining a startup that was working on a version of AGI. This experience gave me a glimpse of a different kind of creativity. Ultimately, I switched to the industrial design industry, where I have been for the past four years. I am responsible for numerous visualizations and for devising novel approaches to present ideas, which led me to diffusion models early last year. I began with Disco Diffusion, having missed the entire GAN era, and now I am heavily involved with custom tools that I write, such as MJ, SD, and more.
Do you have a specific project you’re currently working on? What is it?
I am currently working on a significant project at the intersection of generative art and AI. I have been immersed in this work for several months, which feels like years now. The convergence of these two computationally driven yet distinct areas of art is what fuels much of my technical interests at present.
What drives you to create?
There are a few factors, two of which are major and closely interconnected. I am fascinated by all types of communication. The fact that we can communicate at all is incredible; however, the bandwidth and fidelity are quite primitive. Visual mediums help address some of the shortcomings of communication, but in my opinion, the combination of written and visual has the greatest impact, provided that people are willing to absorb and process information. Ultimately, this fuels my passion for understanding intelligence. Communication is just a part of it, but in my view, it is a very important aspect.
What does your workflow look like?
I can roughly divide all my work into two categories. One originates from a surprise, where I accidentally stumble upon a workflow, a striking image that works well as part of the prompt, or something similar. This becomes the seed of the final artwork. All thinking stems from this initial discovery and drives the technology I use forward. The other workflow (which I enjoy more) occurs when I have some external constraint, such as a Foundation World with a theme or a thematic challenge. This lack of control over the subject sends the design part of my brain into overdrive, and I feel it allows me to create my best work. For some reason, this constraint has to be external. I’ve tried to write out themes for myself, but it never worked, because I had control to change it — there was no external judge to tell me that I went off track.
What is your favourite prompt when creating art?
I know that prompts come to mind when people discuss AI art, but I’m afraid I might disappoint you, as I don’t really use many. I mean, I do because I have to obtain an initial embedding, but 99% of what I do is purely visual. In MJ, I use 3-4 images at a time (mostly self-painted textures, shapes, or collages), while in SD, I do a lot of work using aesthetic gradients and control nets with sketches. I have only two prompts (one for MJ and one for SD), and the rest is purely technical work.
How do you imagine AI (art) will be impacting society in the near future?
AI art, specifically, will undoubtedly facilitate personalized entertainment as the most apparent commercial outcome. Ultimately, I hope it will help to enhance our collective creativity. It enables many more people to ask the “what if” question.
Who is your favourite artist?
There are many artists I admire for various reasons. Having worked in design for some time, I can appreciate problem-solving as a form of art. Naturally, people like Dieter Rams and Naoto Fukasawa come to mind as prominent industrial designers. I love Jean Girard and Katsuhiro Otomo in the more illustrative genre. Reading about artists such as Richard Diebenkorn, Marc Chagall, and Mark Rothko inspires me greatly.
Anything else you would like to share?
One thing I keep coming back to is the fact that AI art is a tool. It all comes down to ideas. The important part is the story or message you want to convey. With generative tools, it becomes easier to explore ideas and assess their merit. However, without ideas and direction, the tools remain just that - lifeless pieces of code.
Creation: Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
I got a bit tired of Midjourney’s limited and buggy archive download solution so I decided to put together a quick and dirty chrome extension with GPT-4 that only downloads upscaled images. As a next step I want to add the ability to attach prompts as metadata to the downloaded files so we can better organize Midjourney files locally. Code can be found on GitHub.
@dymokomi open-sourced a tool called dygen which is a python script that can apply painting textures to your images. Worth trying out.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences.
Reddit user u/lazyspock put together a cheatsheet for female haircut styles. Within the cheatsheet is also a workflow described on how the different hairstyle prompts where generated, so this could potentially be reused for other concepts.
A leaked internal Google document that is making the rounds claims that open source AI will outcompete Google and OpenAI. One can only hope!
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa