Hello there, my fellow dreamers, and welcome to issue #41 of AI Art Weekly! 👋
We’ve finally surpassed the 2000 subscribers milestone 🥳. Thank you all for your support and for being part of this community. People say the bigger the audience gets, the easier it’ll be to grow. I’ve found this to be absolutely not true. Getting the last 250 of you onboard has been the toughest challenge yet, so I’m glad you made it all here 🧡
Let’s jump into this weeks issue, here are the highlights:
- Stable Diffusion XL 0.9 weights leaked and 1.0 released date revealed
- New Midjourney panning feature
- Artistic Cinemagraph: Animating images from text
- DragonDiffusion: DragGAN for diffusion models
- SketchMetaFace: Sketching 3D faces
- DreamIdentity: Efficient face-identity preserved image generation from a single image
- Voicebox: Meta’s new speech synthesis model
- RobustL2S: Lip-to-speech synthesis 🤯
- Interview with AI artists Polza and DEHISCENCE
Cover Challenge 🎨
News & Papers
Stable Diffusion XL 0.9 weights leaked, 1.0 release date and a new Midjourney panning feature
The Stable Diffusion XL 0.9 weights got leaked on HuggingFace this week, and they got as promptly removed as they were put up. Some folks were fast enough to snatch them though, and so they are still available through a torrent. Emad shared that people should wait with training or relying to much on 0.9 though, as 1.0 will have additional RLHF fine-tuning compared to 0.9 which apparently will make a big difference and gets released on July 18th according to the man himself.
Midjourney meanwhile pushed a new panning feature which lets you pan upscales horizontally or vertically.
Artistic Cinemagraph: Synthesizing Artistic Cinemagraphs from Text
The recent AI advancements in image segmentation are enabling new possbilities. Artistic Cinemagraph for instance makes it possible to animate flowing clouds or water in images fully automated from text descriptions. It’s also possible to describe in which direction the water should flow for instance. The best part about this, is that it also works on existing images and paintings. So you can also bring shots from your last holidays to life.
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
It’s been 7 weeks since DragGAN got announced, and one week since the official implementation got released. This week, we got DragonDiffusion. Basically the DragGAN equivalent but for diffusion models.
SketchMetaFace: A Learning-based Sketching Interface for High-fidelity 3D Character Face Modeling
Similar like ControlNet scribble for images, SketchMetaFace brings sketch guidance to the 3D realm and makes it possible to turn a sketch into a 3D face model. Pretty excited about progress like this, as this will bring controllability to 3D generations and make generating 3D content way more accessible.
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation
Having to train a model on additional concepts and faces might be soon a thing of the past. Given only one facial image, DreamIdentity can efficiently generate countless identity-preserved and text-coherent images in different context without any test-time optimization.
Meta announced Voicebox a few weeks ago, an impressive speech synthesis model that can generate speech from text, transfer the style of an audio input, edit spoken text or remove background noise from audio clips. This will probably get never released as an open-source model, but still interesting to see what will be possible in the near future.
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis via Self-Supervised Learning
Ok, this is a crazy one. We’ve seen a lot of research around the same concepts in the past few weeks (image, video, 3D, speech and audio), but I get especially excited when I see something new that I wasn’t aware could be possible. RobustL2S is one of these cases. RobustL2S is a lip-to-speech synthesis model. That means it can transform video footage of lips moving into audio 🤯 I can’t wait to try this on some text-to-video output if the code ever gets released.
- DisCo: Disentangled Control for Referring Human Dance Generation in Real World (it’s over for the TikTok dancers)
- HNC-CAD: Hierarchical Neural Coding for Controllable CAD Model Generation
- Proxycap: Real-time Monocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning
- DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!