AI Art Weekly #50

Hello there, my fellow dreamers, and welcome to issue #50 of AI Art Weekly! 👋

Because I was relaxing on the sunny beaches of Turkey last week, I wasn’t able to send out an issue. But I’m back now and I’m ready to catch up on all the exciting AI art news that happened in the past two weeks. Let’s get started! We’ve a ton of stuff to cover:

  • Stability AI released Stable Audio
  • Google presented Generative Image Dynamics
  • VideoGen and Reuse and Diffuse are two new approaches for text-to-video generation
  • Video outpainting with M3DDM
  • TECA can create and edit 3D avatars from text
  • InstaFlow is a new ultra-fast image generator
  • PhotoVerse lets you transfer facial features with only one image
  • SyncDreamer can generate 3D models from a single image
  • XTTS – a new open-source text-to-speech model
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: unprompt
105 submissions by 57 artists
AI Art Weekly Cover Art Challenge unprompt submission by bellamisele
🏆 1st: @bellamisele
AI Art Weekly Cover Art Challenge unprompt submission by dancevatar
🥈 2nd: @dancevatar
AI Art Weekly Cover Art Challenge unprompt submission by annadart_artist
🥈 2nd: @annadart_artist
AI Art Weekly Cover Art Challenge unprompt submission by EternalSunrise7
🥉 3rd: @EternalSunrise7

News & Papers

Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion

Stability AI introduced Stable Audio this week, a new model allowing to generate audio clips that are up to 90 seconds long. The model can only be used through their web service for now, but they plan on releasing an open-source version which will allow you to train your own audio generation models.

Go to the blog post link above to listen to generated audio samples.

Generative Image Dynamics

This is an interesting one. Google presented Generative Image Dynamics this week. The method can be used to turn a single image into a seamless looping video or an interactive dynamic scene which allows one to move an object within the scene by drag & dropping it.

GID examples

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Guidable open-source image-to-video is on the horizon. VideoGen is able to generate high-definition videos with high frame fidelity and strong temporal consistency using a reference image and a text to guide video generation.

VideoGen example

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

Another video synthesis model that caught my eye this week is Reuse and Diffuse. The novel framework for text-to-video generation adds the ability to generate more frames from an initial video clip by reusing and iterating over the original latent features. Can’t wait to give this one a try.

A chihuahua in astronaut suit is floating in space.

M3DDM: Hierarchical Masked 3D Diffusion Model for Video Outpainting

Video outpainting wen? M3DDM is a diffusion model specifically designed for video outpainting, aiming to adequately complete missing areas at the edges of video frames. Some results look wacky, but overall this looks extremely promising. It’s not available yet, but apparently it’ll get released as part of a product.

M3DDM example

TECA: Text-Guided Generation and Editing of Compositional 3D Avatars

Another paper, another method to generate 3D avatars from text. TECA is able to generate realistic avatars with hair, clothes, and accessories that can be edited and transferred between avatars. The method is also able to generate an animation-ready avatar by leveraging the SMPL-X model.

TECA examples

InstaFlow! One-Step Stable Diffusion with Rectified Flow

It looks like we’re one step closer to real-time image diffusion! InstaFlow is an ultra-fast, one-step image generator that achieves image quality close to Stable Diffusion 1.5, significantly reducing the demand of computational resources.

InstaFlow! comparison with SD 1.5

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

There hasn’t really been a good option to “clone” faces with only one image so far. PhotoVerse seems to solve this as it only requires a single reference image and doesn’t need any test-time tuning.

PhotoVerse examples

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

SyncDreamer is able to generate multiview-consistent images from a single-view image and thus is able to generate 3D models from 2D designs and hand drawings. It wasn’t able to help me in my quest to turn my PFP into a 3D avatar, but someday I’ll get there!

SyncDreamer examples

More papers & gems

  • StyleLM: Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
  • DELTA: Learning Disentangled Avatars with Hybrid 3D Representations
  • SDFlow: Semantic Latent Decomposition with Normalizing Flows for Face Editing
  • DN2N: Text-driven Editing of 3D Scenes without Retraining
  • TRAvatar: Towards Practical Capture of High-Fidelity Relightable Avatars
  • AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Blend of two images with the prompt autumnal forest, ancient greece castles, impressionist, in the style of Krzysztof Lubieniecki, intricate, detailed, todd hido --w 3 --style raw --c 34 --ar 3:2 by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa