Hello there, my fellow dreamers, and welcome to issue #45 of AI Art Weekly! 👋
With SDXL out for a week, the community is currently a bit split on it. I haven’t found the time to properly use it myself, but I’m currenlty doing a lot of research into building a new PC that supports running and fine-tuning Stable Diffusion as well as Large Language Models as I’m tired of Today’s cumbersome and limited cloud solutions. Once I’ve finished the build, I’ll share my specs here. I also started minting my Genesis artworks on Foundation, which are now open to collectors. Oh, btw… Droids are coming. Google has built a model that lets robots semantically interact with our world. But putting that aside for now, let’s dive into this weeks AI art related news. The highlights:
- LP-MusicCaps generates captions from music
- I-Paint assists you while painting
- DWPose is a new OpenPose contender
- HierVST is a zero-shot voice transfer system 🤯
- Interview with artist Pocobelli
- A comprehensive SDXL artist guide
- And more
Cover Challenge 🎨
News & Papers
LP-MusicCaps: LLM-Based Pseudo Music Captioning
LP-MusicCaps is a system that lets you convert music to text – not just transcriptions of lyrics, but actually what’s being played. There is a HuggingFace demo you can try, but more interestingly this enables to generate images or videos directly from music by transforming the music captions to visual descriptions. Checkout the resource section below for more.
I-Paint: Interactive Neural Painting
Text-to-Image, Image-to-Image, Audio-to-Image, Brain-to-Image, all possible nowadays. But all these approaches are almost entirely non-interactive. Imagine you’re drawing a picture from a reference, I-Paint aims to assist that process by suggesting the next brush strokes to finish the artwork. Obviously still a tech-demo and only applicable right now for digital paintings, but imagine something like that with an Apple Vision Pro headset while drawing on a physical canvas 🤯
DWPose: Effective Whole-body Pose Estimation with Two-stages Distillation
There is a new OpenPose contender to make your ControlNet pose images even better. The new post estimator is called DWPose and uses a two-stage distillation approach to improve the accuracy of the pose estimation. The comparisons with OpenPose look pretty great.
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Voice cloning is getting crazy good. HierVST is zero-shot voice transfer system without any text transcripts. That means it is able to transfer the voice style of a target speaker to a source speaker without any training data from the target speaker.
More papers & gems
- Motion Mode: Computational Long Exposure Mobile Photography
- VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!