AI Art Weekly #59

Hello there, my fellow dreamers, and welcome to issue #59 of AI Art Weekly! 👋

Things are picking up in AI world and I’ve another packed issue for you this week. I mean, just look at the list below. No wonder anybody can’t keep up anymore 😅

  • YouTube AI Music Tools: A sneak peek!
  • Four new 3D object generation methods
  • And 3D Paintbrush to edit them
  • D3GA to turn people into animatable 3D Gaussian Avatars
  • NVIDIA’s Adaptive Shells
  • MCVD to predict the past and the future
  • Meta’s Emu Video Generation model
  • Stable Diffusion 1.6
  • The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
  • InterpAny-Clearer: Sharper frame interpolation
  • Draw A UI: Turn sketches into UI
  • Interview with AI animator and illustrator sleepysleephead
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: mythology
60 submissions by 37 artists
AI Art Weekly Cover Art Challenge mythology submission by bellamisele
🏆 1st: @bellamisele
AI Art Weekly Cover Art Challenge mythology submission by beholdthe84
🥈 2nd: @beholdthe84
AI Art Weekly Cover Art Challenge mythology submission by BonsaiFox1
🥉 3rd: @BonsaiFox1
AI Art Weekly Cover Art Challenge mythology submission by AlchemAIst
🧡 4th: @AlchemAIst

News & Papers

YouTube Dream Track and AI Music Tools

YouTube shared a sneak peek at their first set of AI-related music experiments built with Google DeepMind this week.

An experiment which is called Dream Track can generate 30-second music tracks for YouTube shorts from text descriptions. It’s powered by 9 artists who have partnered with Google and it’s available to a small select group of US based creators for now.

So far so good, but the really interesting part for me is the demo of their AI Music Tools. This tech apparently can create entire tracks from only a simple hum. Imagine you’re hanging out with your friends and you all can create high-fidelity music together by pitching different hums. Can’t wait until this becomes available!

YouTube’s AI Music Tools preview. Check the link above for sound.

Instant3D (1), One-2-3-45++, Instant3D (2) and DMV3D

Generative 3D is definitely on the rise. We not only got one new 3D object generation method, but four! Two of them even with the same name 🤪 Let’s make a quick run through them:

  • Instant3D: Low quality Text-to-3D in less than one second.
  • One-2-3-45++: Image-to-3D within 20 seconds, refined version in 60 seconds.
  • Instant3D: High quality Text-to-3D in 20 seconds.
  • DMV3D: Text-to-3D and Image-to-3D in 30 seconds.

It seems crazy to me that we still haven’t cracked hiqh-quality real-time image generation while generative 3D makes these kind of speed and quality jumps.

Instant3D example

3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation

What to do with that many 3D objects? Exactly, edit them! 3D Paintbrush is a technique for automatically texturing local semantic regions on meshes via text descriptions. The method is designed to operate directly on meshes, producing texture maps which seamlessly integrate into standard graphics pipelines.

3D Paintbrush example

D3GA: Drivable 3D Gaussian Avatars

D3GA is the first 3D controllable model for human bodies rendered with Gaussian splats in real-time. This lets us turn ourselves or others with a multi-cam setup into a Gaussian splat which can be animated, even allowing to decompose the avatar into its different cloth layers.

D3GA examples

Adaptive Shells for Efficient Neural Radiance Field Rendering

Talking about Splats, NVIDIA presented a new method for efficiently rendering interactive NeRFs called Adaptive Shells this week. Like Gaussian Splats, Adaptive Shells are also able to render at real-time rates even while applying physical simulation and animation to them.

Adaptive Shells example

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

MCVD is a new video generation model. Compared to other models, this one doesn’t focus on high resolution outputs (yet), but has another very interesting capability: it can predict the past and the future. Well, not literally. But based on the data it gets fed, it can generate frames either from nothing or from given frames or up until them. Knowing the past and the future also gives it the ability to interpolate between frames.

MCVD examples

Emu Video

Meta showcased a new text-to-video generation model called Emu Video which can created 4-second short videos at 512x512 resolution and 16fps. But it doesn’t directly transform text to video. It first generates the image and then the image into a video for better results. According to their study, Emu wins against current state of the art models like Gen-2 and Pika Labs. No real way to test this though.

A teddy bear painting a portrait generated with Emu Video

Stable Diffusion 1.6

Stable Diffusion 1.6 is here, partially! Compared to 1.5, 1.6 is designed to be more cost-effective. It now supports aspect from 320px to 1536px on either side, and has been optimized to provide higher quality 512px generations. The weights haven’t been released yet though. But you can play with the model within Stability’s developer platform sandbox.

Comparison of Stable Diffusion 1.5 to 1.6 with the same settings and prompt: A wolf in Yosemite National Park, chilly nature documentary film photography

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

A crucial aspect for story telling, asset design, advertising, and more is the ability to generate consistent characters. So far this has been a pain for text-to-image models. The Chosen One is a new method that aims to make this easier, fully-automated. The method works for different character types, styles, can change the age of a character and can be used for illustrating stories.

The Chosen One examples

InterpAny-Clearer: Clearer Frames, Anytime – Resolving Velocity Ambiguity in Video Frame Interpolation

InterpAny-Clearer is a new video frame interpolation method that is able to generate clearer and sharper frames compared to existing methods. Additionally, it introduces the ability to manipulate the interpolation of objects in a video independently, which could be useful for video editing tasks.

InterpAny-Clearer example

More papers & gems

  • Music ControlNet: Multiple Time-varying Controls for Music Generation
  • Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
  • Human-SGD: Single-Image 3D Human Digitization with Shape-Guided Diffusion

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

What are you looking at? 👁️👁️” by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buy a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa