AI Art Weekly #56
Hello there, my fellow dreamers, and welcome to issue #56 of AI Art Weekly! 👋
Two major developments from the generative AI art front this week. Apple has released research for Matryoshka Diffusion Models, their take on text-to-image models. Latent Consistency Models might be the next evolution of Diffusion models which can create images much faster in 1-4 steps. Let’s jump in:
- Latent Consistency Models – faster text-to-image!
- Apple’s Matryoshka Diffusion Models
- FreeNoise can create text-to-video with 512 frames
- DreamCraft3D: High quality 3D generation
- Wonder3D is another image to 3D method
- Zero123++ can generate multi-view images from a single input image
- E4S is a new method for fine-grained face swapping
- and more tutorials, tools and gems!
Putting these weekly issues together takes me between 8-12 hours every Friday. If you like what I do, please consider buying me a coffee so I can stay awake 🙏
Cover Challenge 🎨
It’s spooky season! For next weeks cover I’m looking for “twilight zone” inspired covers. The reward is $50 and the Challenge Winner for the winner and the Challenge Finalist role for all finalists within our Discord community. These rare roles earn you the exclusive right to cast a vote in the selection of future winners. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
Latent Consistency Models: Synthesizing High-Resolution Images with Few-step Inference
There is a new category of generative models emerging, called Latent Consistency Models (LCMs). These models can be distilled from pre-trained Stable Diffusion models and are able to generate high quality 768x768 resolution images in only one to four steps, significantly accelerating text-to-image generation. For comparison, traditional diffusion models require 20-50 steps. Early signs show that this will bump up the speed of image generation to 100ms on powerful GPUs with some further optimizations.
Matryoshka Diffusion Models
Apple is getting into the generative AI game. Matryoshka Diffusion Models (MDM) are their latest research for generating high-quality text-to-image & text-to-video with a multi-resolution diffusion model that can generate results at a resolution of up to 1024x1024 pixels. Compared to Stable Diffusion or Google’s Imagine, the MDM doesn’t require a pre-trained VAE or any additional upscaling modules and can be trained much more efficient. The code isn’t available yet, but will apparently get released soon.
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
FreeNoise is a new method that can generate longer videos with up to 512 frames from multiple text prompts. That’s about 21 seconds for a 24fps video. The method doesn’t require any additional fine-tuning on the video diffusion model and only takes about 20% more time compared to the original diffusion process.
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
3D generations are getting more sophisticated by the week. DreamCraft3D can create high-quality 3D objects from a single prompt. It uses a 2D reference image to guide the sculpting of the 3D object and then improves texture fidelity by running it through a fine-tuned Dreambooth model.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Wonder3D is yet another image-to-3D method. This one is able to convert a single image into a high-fidelity 3D model, complete with textured meshes and color. The entire process takes only 2 to 3 minutes.
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Zero123++ is a new model that can generate multi-view images from a single input image. Which gave me another opportunity to test how my avatar might look from another angle. Still not impressed…
E4S: Fine-Grained Face Swapping via Regional GAN Inversion
E4S is a new method for fine-grained face swapping. It’s able to swap faces in images and videos, while preserving the source identity, texture, shape, and lighting of the original footage.
More papers & gems
- MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
- Relit-NeuLF: Efficient Novel View Synthesis with Neural 4D Light Field
- PERF: Panoramic Neural Radiance Field from a Single Panorama
- DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Someone by the name of anime_is_real shows how beautiful editing can bring life to even ‘simple’ generations. Simply beautiful work.
@paultrillo created a trippy music video that fuses traditional VFX with a variety of boundary-pushing AI techniques. Fun to watch!
@OdinLovis crafted a beautiful audio-reactive 3D animation with Cinema 4D and finessed it with ComfyUI & AnimateDiff. Something I’ve on my ToDo list as well. Super cool.
My favourite feature of Gen-1 is the ability to convert mockup videos into high quality renders. @A_B_E_L_A_R_T is exactly doing that, and much more.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Riffusion got a liftover and an upgrade. The newest version is able to generate 12 second music clips from specific lyrics and a sound prompt. Checkout one of mine.
If you want to give LCM a spin in your A1111 setup, this extension is for you.
@fofr put together a tutorial on how to run Latent Consistency Models on your M1 or M2 Macs to generate 512x512 images in one second.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buy a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa