Hello there my fellow dreamers and welcome to issue #34 of AI Art Weekly! 👋
This issue is packed! And apart from compiling this weeks announcements I’ve been quite busy tinkering with how to create engaging shorts for the news segment of this newsletter. I had one requirement: The production of a video shouldn’t take longer than 10 minutes. Luckily Today’s AI models are perfectly capable of assisting with this task. So I wrote a quick Ruby script that turns the news segment below into a voiced Final Cut Pro project so I can put my final touches on it. Something was missing though: Shorts and TikTok videos perform way better with captions. There are paid options like Descript that can do this very well already, but I didn’t want the overhead of having to click through yet another UI (remember, the 10 minutes max requirement). Two days and countless trial & errors later I finally found a solution. I plan to release short summaries from now on YouTube and TikTok. Enjoy this weeks newsletter ✌️
- DragGAN lets you modify images by dragging their contents around
- PYoCo is a text-to-video diffusion model by NVIDIA
- FastComposer – zero-shot fine-tuning of diffusion models
- Google releases SoundStorm – 100x faster audio generation compared to AudioLM
- LayersNet 3D garments animation are bonkers
- Interview with artist Ren AI
- StabilityAI open-sources their DreamStudio WebUI
Cover Challenge 🎨
Reflection: News & Gems
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
As someone who designed apps and websites for the last decade and more, I love to see how generative AI is getting integrated into more user friendly interfaces. DragGAN lets you manipulate the pose, shape, expression, and layout of an image’s contents by simply “dragging” them around. With the ability to play around with a wide variety of categories, from cars to cats to humans, this tool is like a magic wand for image manipulation, transforming your creative process into an interactive game of pixel puppetry. Code hasn’t been open-sourced yet, but will be in June.
PYoCo is yet another text-to-video diffusion model that is based on the eDiffi model (issue #7) by NVIDIA. PYoCo can generate compositional videos by describing an protagonist, an action and a location for example. The model is also capable of generating videos in a wide variety of styles like photorealism or
in the style of Chinese ink art and more. I haven’t found an implementation for us normies yet, but I’ll keep an eye out.
LDM3D and DepthFusion
Holodecks are getting closer, sort of. LDM3D (by @ScottieFoxTTV whom I interviewed last year) is able to generate 2D RGBD images. The model is a specialized version of the Stable Diffusion 1.4 model, that has been modified to fit both image and depth map data. This is where DepthFusion comes in, which takes these images and maps, and spins them into interactive 360-degree-view experiences.
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
GETMusic is a game-changer in the realm of symbolic music generation, making it a cakewalk to compose target instrumental tracks from scratch or based on your very own source tracks. I’m not a musician, but I’ve dabbled with some DAWs in the past. I can see myself using something like this to accompany me while I play around on a drum pad, generating additional instrumental tracks.
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Dreambooth and LoRA paved the way to train diffusion models to generate images of people that weren’t in the original dataset. FastComposer eliminates that complex requirement of having to assemble a dataset and train a new model. Not only can you generate multiple novel subjects with just reference images, it’s also 300x-2500x faster compared to other methods. Code is available on GitHub.
SoundStorm: Efficient Parallel Audio Generation
You all remember MusicLM by Google (which you can by the way signup for beta testing at AI Test Kitchen)? Well, they released another audio model called SoundStorm which is able to produce audio of the same quality and with higher consistency in voice and acoustic conditions. All while being 100 times faster. To put that in perspective, it can generate 30 seconds of audio in 0.5 seconds on a TPU-v4 🤯
LayersNet: Towards multi-layered 3D garments animation
Taking a look at LayersNet is like watching magic happen in the world of 3D garment animation - this fresh and innovative data-driven method brings multilayered garments to life with particle-wise interactions in a micro physics system. To sweeten the deal, they’ve whipped up the D-LAYERS dataset, full to the brim with 700K frames showing the dynamics of 4,900 different multilayered garment combinations, with humans and even a bit of wind thrown into the mix for that extra sprinkle of realism.
FitMe: Deep Photorealistic 3D Morphable Model Avatars
Say cheese! FitMe is a model that allows you to turn your or anyone’s selfies into high-quality, relightable 3D avatars which can be directly used in common gaming and rendering engines. The best part? It requires only one image and produces photorealistic results, making your 3D self look just as radiant in virtual reality as you do in, well, reality 😉 BUT, there is no code (yet) 😭
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Imagine if you could morph your boring walk down the street into a whimsical panda’s frolic in the snow. That’s exactly what Make-A-Protagonist lets you do. The method is able to change either the protagonist, the background or both with the use of a reference image of a video. Coherencency is not extremely great, but I can see this being useful.
More papers and gems
- BlendFields: Few-Shot Example-Driven Facial Modeling
- AutoRecon: Automated 3D Object Discovery and Reconstruction
- P3DL: Progressive Learning of 3D Reconstruction Network from 2D GAN Data
- QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
- OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Imagination: Interview & Inspiration
In this week’s issue of AI Art Weekly, we talk to Italy-based artist, Ren. Ren has a background as a classical painter but recently pivoted to AI art to preserve his passion for art. His impressionistic artwork has been exhibited in NYC, Tokyo, Rome, and other cities around the world. I was a bit behind schedule in organizing an interview this week, so I’m grateful that a fellow AI whisperer found the time to answer my questions on such short notice. Let’s jump in!
What’s your background and how did you get into AI art?
I come from many years of traditional painting, working mostly with oil and on the subjects and style of the Renaissance masters. My passion was transmitted to me by my family. My grandfather was a surreal painter, he created views of Venice with no water. Red earth and stone-like humans. My father and my aunt helped him with the backgrounds and more tedious tasks, and later became painters themselves. So I can say it runs in the family!
I got into AI art almost as a call for help, a last tentative to save my passion for art and painting. I wasn’t able to sustain myself with painting sales and my day job didn’t allow me to experiment outside of boring commissions. AI gave me the nimbleness and burst of creativity to start anew, and I’m very grateful for this.
Do you have a specific project you’re currently working on? What is it?
I have two big projects ongoing but sadly I can’t talk about them at this time! What I can tell you is that I’m now entering my second year in NFTs and my focus will be much higher on creating art with an impact that will last for decades. I’m closing myself down on refining my message, my delivery and my own unique aesthetic. The best is yet to come!
You talk about refining your personal aesthetics, do you have any tips for other artist on how to achieve this?
I’m currently working very hard on this. I think the best option I found was to ditch the search for cool visuals and instead focus on the message first. I you have a strong narrative you want to follow, you’ll also naturally associate an aesthetic to it. And since you are invested in that narrative, you’ll be less likely to get sidetracked by other cool stuff!
What drives you to create?
I’m creating as a way to let out from my system my thoughts, sentiments and ideas that I normally wouldn’t consider discussing with anyone but myself. It’s a bit like going to therapy but the canvas (or in this case the AI) is the active listener. I don’t consider AI a mere tool for creating, but rather a multiplier of artistic and conceptual skills, and I’m here for it.
What does your workflow look like?
I’m a very creative person. I always look at things as they happen and create my own version of the story. A little trick of mine is stopping a movie mid action, look at a frame and come up with my personal twist based on the thing I’m fixated with at the moment.
When some topic or narrative twist picks my interest, I start researching compositions with hand sketches. Composition was so hard for me at the start of working with AI, and it’s so important to convey a story. Fortunately these days with ControlNET and img2img this is much easier. I use prevalently Stable Diffusion (I like having fine tune control on a plethora of little caveats) but I’m finding myself drawn more and more to MidJourney. Their latest model is so much more coherent in creating the scene and so now the challenge is to bend them to look like a Ren artwork!
What is your favourite prompt when creating art?
My favorite prompt is the one that I mistype or when I have fun matching strange subjects! I always try so hard to coerce the AI into giving me the result I’m painting in my head that sometimes I forget to discuss with it, let it steer my vision and contaminate my delivery. And with AI I mean the thousands of images created by other artists. In a way I’m discussing with them. After all, art as an ever evolving human communication channel, is a collective effort!
How do you imagine AI (art) will be impacting society in the near future?
I love to separate language models from art models, but I’m not sure this is just a way for me to live with the choice I made to take advantage of this incredible technology. Reality is, I’m very positive that AI will dramatically change the very foundation of our society. Things like repetitive tech human labor will disappear. Professions will be conducted in very different ways and overall growth will exponentially accelerate. BUT humanity being humanity, I fear the day someone will use ChatGPT8 (or whatever it will be called) with malicious intent. We need to take a serious look at what we’re creating and what the implications will be. I don’t think we will though. My stance is to make the most out of it and help educate others as much as I can on the moral implications on its impact.
Who is your favourite artist?
I have huge respect for so many players in our web3 artistic space, but I owe my mindset and education to the old masters. I love Thomas Cole’s sense of wonder, I love Gustave Courbet’s intense depictions of human sentiments, I love Tiziano Vecellio technical skills in depicting the themes of his time through clever allegories… the list goes on and on!
Anything else you would like to share?
I came into the crypto art space to find a new outlet to show my artwork and I found many new friends, a new paradigm for art and the vibes of a new Renaissance coming from the underdogs and the unchained. I hope that many trad artists can find the courage to take the leap and try for themselves. I have no doubt the future of art will stem from here.
Creation: Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!