Hello there, my fellow dreamers, and welcome to issue #36 of AI Art Weekly! 👋
NVIDIA showed the world their latest developments at Computex in Taiwan this week, this supercut is a good start if you want to get up to date. It’s going to be interesting to see how NVIDIA’s current dominance will be met by competitors. Especially considering that a lot of machine learning development depends on the CUDA ecosystem. Apple’s Worldwide Developer Conference is kicking off on Monday. So we’ll see if Apple is joining the AI race or if the biggest reveal will be their (pretty much confirmed) AR/VR-headset.
On another note, I’m experimenting with publishing community posts. You can learn more about it here. If you have a unique workflow you would like to share with others, this might be for you.
But enough of that, let’s jump into what happened in the world of AI Art this week.
- RAPAHEL is a new text-to-image model focusing on artistic quality
- Improved image manipulation with Cocktail, Break-A-Scene, InstructEdit & Self-guidance
- Interview with artist Lisa Elliott
- roop – a one-click deepfake tool for videos
Cover Challenge 🎨
News & Papers
We’ve another week packed with tons of new research. Let’s start with the more “basic” image advancements and then move on to more “complex” topics like video and 3D.
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
There is a new text-to-image player called RAPHAEL in town. The model aims to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is all great, but only if someone actually releases the model for open-source consumption as the community is craving a model that can achieve Midjourney quality.
Cocktail 🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Cocktail is novel pipeline for guiding image generating. Compared to ControlNet, it only requires one generalized model for multiple modalities like Edge, Pose and Mask guidance. There is no demo yet, but I’m sure this won’t take long. The code can be found on GitHub.
InstructEdit & Self-guidance
Controlling image generation is one thing, editing another. And as the old wisom goes: one paper seldom comes alone. This week we got InstructEdit and Self-guidance, both providing different methods for image manipulation. In this case I’m especially impressed with Self-guidance’s ability for compositional generation which lets you combine concepts from different images onto a new canvas. In my head I imagine myself wearing Apple’s new XR headset (that with a high certainty gets revealed this coming Monday 🤞) and mix-matching images in my virtual studio with voice controls and hand gestures only. A bit nerdy, I know 😅
Break-A-Scene: Extracting Multiple Concepts from a Single Image
Now instead of compositional generation from multiple images, what if we could extract multiple concepts from a single image and create new images based on one or more of those concepts? Break-A-Scene is a new method that can do just that.
TaleCrafter: Interactive Story Visualization with Multiple Characters
Lets continue with video. If you’re interested how the future of storytelling might look like, you should check out TaleCrafter. It’s a tool that allows for the creation of interactive stories with multiple characters by providing a pipeline that combines mutliple steps - turning a written story into a prompt, creating a layout from this prompt, bringing the characters and scenes to life from text descriptions, and finally converting these generated images into immersive videos. Essentially, it’s like having a movie studio and a director at your fingertips, generating consistent visuals from a simple story input.
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
We’ve already seen a few attempts at bringing ControlNet to video, but getting temporal coherency right seems to be a trick issue to solve. ControlVideo is the next attempt and things start to look extremely promising.
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
As I already said, one paper seldom comes alone. The team behind TaleCrafter above also announced Make-Your-Video this week, yet another addition to the video-transformation family. And, oh boy, does it look good. I think it’s comparable to RunwayML’s Gen-1 in style transfer quality (minus the shutterstock overfitting).
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
Let’s move on to 3D. StyleAvatar3D is a new method for generating high-quality, stylized 3D avatars from pure text or from an input image.
AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation
AlteredAvatar on the other hand is a method for stylizing dynamic 3D avatars with a focus on finding the balance between speed, flexibility and quality, while maintaining consistency across a wide range of novel views and facial expressions.
Humans in 4D: Reconstructing and Tracking Humans with Transformers
A field I’m very interested in, is motion capturing from a single video input. 4DHumans brings us a step closer to that. It still lacks temporal smoothness, but this certainly has potential without requiring an expensive setup of multiple cameras and special suits which can cost up to multiple tens of thousands of dollars.
GenMM: Example-based Motion Synthesis via Generative Motion Matching
Now motion capturing is cool. But what if you want your 3D characters to move in new and unique ways? GenMM is able to generate a variety of movements from just a single or few example sequences. Unlike other methods, it doesn’t need exhaustive training and can create new motions with complex skeletons in fractions of a second. It’s also a whiz at jobs you couldn’t do with motion matching alone, like motion completion, guided generation from keyframes, infinite looping, and motion reassembly.
SISR: Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers
And that’s almost it, but I have one more thing I want to show you. Three weeks ago I wrote about how we’ll probably soon see models that are able to reconstruct license plates from a blurry image (see issue #33). And Today, here we are with a paper called SISR. Definitely a missed opportunity to call this “Enhance!” 😅
I wasn’t kidding when I said there was a ton of things happening this week. Here are some more papers that are also worth mentioning:
- Gen-L-Video: Long Video Generation via Temporal Co-Denoising
- Inpsiration Tree: Concept Decomposition for Visual Exploration and Inspiration
- NeuManifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support
- ActAIM: Self-Supervised Learning of Action Affordances as Interaction Modes
- SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network
- HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance
- RIVAL: Real-World Image Variation by Aligning Diffusion Inversion Chain
- Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
I always try to interview artists that have some unique approach, vision or captivating style going for them. Lisa Elliott is one of them. Her work is a mix of traditional and digital art with a sprinkle of AI magic and I love that she has agreed to do this very personal interview. Let’s jump in.
What’s your background and how did you get into AI art?
Let me tell you right away - I am self-taught. You won’t find any tales of a long apprenticeship here :)
I’ve been surrounded by art since birth. My father was a fan of fine art, music, and theater, so our house was always filled with his creative friends and their works. We visited different galleries and museums every week. For that, I owe a huge thank you to my father.
Despite the artistic abundance around me, what captivated me most were the paintings that my father’s friend, Sergei Serp, created. He’s now quite a famous artist. I can confidently say that I owe my current pursuits to him and my father. These paintings were frightening and thrilling, stirring up so many thoughts and emotions in my young mind that the characters from these paintings began to appear in my dreams. I dreamt about them for many years. Each character had their own name, their own power, and so forth. The strangest thing is, these paintings are my only vivid memories from early childhood.
Naturally, as I grew older, I attended art school, from which I swiftly departed. My teacher didn’t share my artistic interests and constantly mocked my characters and drawings. In his opinion, a girl should be drawing flowers and cute pets. This experience at the art school dissuaded me from drawing and instilled numerous complexes in a 6-year-old girl.
I returned to drawing at a more mature age when I was battling depression, and nobody could help me. I took up oil painting, purchased an iPad, and learned to draw digitally. My hobby was supported by my mother, best friend, and boyfriend, who were always ready to accompany me to buy a new canvas or paint. At that point, I was aware of AI development, but I didn’t pay it much heed.
Years later, I began to discover more and more information about AI, and my friend, who had also become interested in it, was continually sharing more about it with me. So, I decided to try and learn to use these tools. I enjoyed it, and that’s how I found myself in this space.
Do you have a specific project you’re currently working on? What is it?
I was asked to give this interview exactly when I took a short break as I need a little rest after each segment of my work. If I don’t, I feel like an assembly line, and my ideas and thoughts get confused. My last project was a failure, in my opinion, which I am not ashamed to say, so I want to get my thoughts together and then move on.
What drives you to create?
My memories and my recent experiences, which, I believe, resonate with many people. I posit that every artist incorporates their current thoughts and emotions into their work. What excites them the most, what they fall asleep and wake up with.
What does your workflow look like?
I don’t follow a distinct sequence of actions. Images and compositions spontaneously appear in my mind. Occasionally, an idea may strike me in the middle of the night, yet I don’t jot it down, choosing to rely on my memory instead. Much of my work is inspired by childhood memories. Not always, but often. If so, I begin by sketching on my old digitized photos, refine the sketch using Midjorney, and further enhance the result with Procreate. When no photos are involved, I create an original image using Midjorney and refine it using Procreate. These are essentially all the tools I utilize. The order of operations constantly changes until I achieve the desired outcome. A single task may occupy me for over a week, which seems exceedingly slow to those accustomed to AI. If I’m not satisfied with the result, I start from scratch. This is why I’m not particularly fond of deadlines.
What is your favourite prompt when creating art?
The only trick I can really share involves describing the subject with different words in the same prompt. For example, in the work below, I achieved a satisfying result only when I entered the words
How do you imagine AI (art) will be impacting society in the near future?
It seems to me that, in the near future, people will not stop being afraid of Artificial Intelligence, considering it a hoax for various reasons. People have always feared anything new, no matter which part of history we look at.
Any ideas on how that fear of AI could be mitigated?
Good question. I think the first step is education and the widespread dissemination of reliable information about the operation of AI and AI artists.
The second is resolving the issues concerning the legitimacy of using works to train AI models. As long as this issue remains unresolved, we witness significant negativity from artists. In my opinion, this arises because people fear job loss and potential redundancy, although the creative process of producing works with the assistance of AI hasn’t been abolished and probably never will. This seems to be a concept that people are reluctant to comprehend.
Lastly, familiarity plays a significant role. As people become more accustomed to AI, the fear will likely diminish.
Who is your favourite artist?
Among traditional artists, unsurprisingly, Sergei Serp, a representative of Necrorealism, which I have discussed previously. Additionally, Ida Applebroog, whose works I encountered at Queen Sofia Art Center in Madrid, profoundly moved me. Probably, after Serguei Serpa, this is the second artist who could arouse a storm of contradictory emotions in me.
Anything else you would like to share?
Keep doing what you love, and don’t let other people influence your perception of your passion.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!