AI Art Weekly #45
Hello there, my fellow dreamers, and welcome to issue #45 of AI Art Weekly! 👋
With SDXL out for a week, the community is currently a bit split on it. I haven’t found the time to properly use it myself, but I’m currenlty doing a lot of research into building a new PC that supports running and fine-tuning Stable Diffusion as well as Large Language Models as I’m tired of Today’s cumbersome and limited cloud solutions. Once I’ve finished the build, I’ll share my specs here. I also started minting my Genesis artworks on Foundation, which are now open to collectors. Oh, btw… Droids are coming. Google has built a model that lets robots semantically interact with our world. But putting that aside for now, let’s dive into this weeks AI art related news. The highlights:
- LP-MusicCaps generates captions from music
- I-Paint assists you while painting
- DWPose is a new OpenPose contender
- HierVST is a zero-shot voice transfer system 🤯
- Interview with artist Pocobelli
- A comprehensive SDXL artist guide
- And more
Twitter recently shut down free API access which puts our weekly cover challenges at risk. By becoming a supporter, you can help me make AI Art Weekly and its community efforts more sustainable by supporting its development & growth! 31/100% reached so far 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for fire, water, earth and wind inspired artworks. The reward is $50. Rulebook can be found here and images can be submitted here. Come join our Discord to talk challenges. I’m looking forward to your submissions 🙏
News & Papers
LP-MusicCaps: LLM-Based Pseudo Music Captioning
LP-MusicCaps is a system that lets you convert music to text – not just transcriptions of lyrics, but actually what’s being played. There is a HuggingFace demo you can try, but more interestingly this enables to generate images or videos directly from music by transforming the music captions to visual descriptions. Checkout the resource section below for more.
I-Paint: Interactive Neural Painting
Text-to-Image, Image-to-Image, Audio-to-Image, Brain-to-Image, all possible nowadays. But all these approaches are almost entirely non-interactive. Imagine you’re drawing a picture from a reference, I-Paint aims to assist that process by suggesting the next brush strokes to finish the artwork. Obviously still a tech-demo and only applicable right now for digital paintings, but imagine something like that with an Apple Vision Pro headset while drawing on a physical canvas 🤯
DWPose: Effective Whole-body Pose Estimation with Two-stages Distillation
There is a new OpenPose contender to make your ControlNet pose images even better. The new post estimator is called DWPose and uses a two-stage distillation approach to improve the accuracy of the pose estimation. The comparisons with OpenPose look pretty great.
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Voice cloning is getting crazy good. HierVST is zero-shot voice transfer system without any text transcripts. That means it is able to transfer the voice style of a target speaker to a source speaker without any training data from the target speaker.
More papers & gems
- Motion Mode: Computational Long Exposure Mobile Photography
- VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
@A_B_E_L_A_R_T put together an AI short movie called Sh*t! It’s the story of a man who wakes up one morning with an empty fridge but a mind full of memories. Beautifully executed. Abel also shared an insightful Tweet about how the short came together.
Reddit Reddit user u/corporalcadet put together a cool music video by creating multiple ControlNet images of a square and fed them to Gen-2. Sweet idea.
@IXITimmyIXI showed us a raw Pika Labs output this week that is crazy good. You can try the Pika Labs Text-to-Video and Image-to-Video model yourself by joining their Discord.
Interviews
@annadart_artist and me are happy to bring you yet another AISurrealism artist interview with @pocobelli this week. Enjoy!
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
@socalpathy put together a Stable Diffusion XL style study. Perfect to gather some visual prompt inspiration for your next latent explorations.
@camenduru put together a Google Colab based on the new improved AudioGen model Meta open-sourced this week. AudioGen is a model that is able to generate sound effects from text prompts.
@fffiloni has been at it again. Based on the LP-MusicCaps paper above, he built a HuggingFace space that lets you generate a video based on the captions generated from the music.
Another @fffiloni HuggingFace space based on LP-MusicCaps. This one generates images from music captions.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa