Hello there, my fellow dreamers, and welcome to issue #50 of AI Art Weekly! 👋
Because I was relaxing on the sunny beaches of Turkey last week, I wasn’t able to send out an issue. But I’m back now and I’m ready to catch up on all the exciting AI art news that happened in the past two weeks. Let’s get started! We’ve a ton of stuff to cover:
- Stability AI released Stable Audio
- Google presented Generative Image Dynamics
- VideoGen and Reuse and Diffuse are two new approaches for text-to-video generation
- Video outpainting with M3DDM
- TECA can create and edit 3D avatars from text
- InstaFlow is a new ultra-fast image generator
- PhotoVerse lets you transfer facial features with only one image
- SyncDreamer can generate 3D models from a single image
- XTTS – a new open-source text-to-speech model
- and more tutorials, tools and gems!
Cover Challenge 🎨
News & Papers
Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion
Stability AI introduced Stable Audio this week, a new model allowing to generate audio clips that are up to 90 seconds long. The model can only be used through their web service for now, but they plan on releasing an open-source version which will allow you to train your own audio generation models.
Generative Image Dynamics
This is an interesting one. Google presented Generative Image Dynamics this week. The method can be used to turn a single image into a seamless looping video or an interactive dynamic scene which allows one to move an object within the scene by drag & dropping it.
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Guidable open-source image-to-video is on the horizon. VideoGen is able to generate high-definition videos with high frame fidelity and strong temporal consistency using a reference image and a text to guide video generation.
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Another video synthesis model that caught my eye this week is Reuse and Diffuse. The novel framework for text-to-video generation adds the ability to generate more frames from an initial video clip by reusing and iterating over the original latent features. Can’t wait to give this one a try.
M3DDM: Hierarchical Masked 3D Diffusion Model for Video Outpainting
Video outpainting wen? M3DDM is a diffusion model specifically designed for video outpainting, aiming to adequately complete missing areas at the edges of video frames. Some results look wacky, but overall this looks extremely promising. It’s not available yet, but apparently it’ll get released as part of a product.
TECA: Text-Guided Generation and Editing of Compositional 3D Avatars
Another paper, another method to generate 3D avatars from text. TECA is able to generate realistic avatars with hair, clothes, and accessories that can be edited and transferred between avatars. The method is also able to generate an animation-ready avatar by leveraging the SMPL-X model.
InstaFlow! One-Step Stable Diffusion with Rectified Flow
It looks like we’re one step closer to real-time image diffusion! InstaFlow is an ultra-fast, one-step image generator that achieves image quality close to Stable Diffusion 1.5, significantly reducing the demand of computational resources.
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
There hasn’t really been a good option to “clone” faces with only one image so far. PhotoVerse seems to solve this as it only requires a single reference image and doesn’t need any test-time tuning.
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
SyncDreamer is able to generate multiview-consistent images from a single-view image and thus is able to generate 3D models from 2D designs and hand drawings. It wasn’t able to help me in my quest to turn my PFP into a 3D avatar, but someday I’ll get there!
More papers & gems
- StyleLM: Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
- DELTA: Learning Disentangled Avatars with Hybrid 3D Representations
- SDFlow: Semantic Latent Decomposition with Normalizing Flows for Face Editing
- DN2N: Text-driven Editing of 3D Scenes without Retraining
- TRAvatar: Towards Practical Capture of High-Fidelity Relightable Avatars
- AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!