AI Art Weekly #59
Hello there, my fellow dreamers, and welcome to issue #59 of AI Art Weekly! 👋
Things are picking up in AI world and I’ve another packed issue for you this week. I mean, just look at the list below. No wonder anybody can’t keep up anymore 😅
- YouTube AI Music Tools: A sneak peek!
- Four new 3D object generation methods
- And 3D Paintbrush to edit them
- D3GA to turn people into animatable 3D Gaussian Avatars
- NVIDIA’s Adaptive Shells
- MCVD to predict the past and the future
- Meta’s Emu Video Generation model
- Stable Diffusion 1.6
- The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
- InterpAny-Clearer: Sharper frame interpolation
- Draw A UI: Turn sketches into UI
- Interview with AI animator and illustrator sleepysleephead
- and more tutorials, tools and gems!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for “pop art” submissions. The reward is $50 and the Challenge Winner for the winner and the Challenge Finalist role for all finalists within our Discord community. These rare roles earn you the exclusive right to cast a vote in the selection of future winners. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
YouTube Dream Track and AI Music Tools
YouTube shared a sneak peek at their first set of AI-related music experiments built with Google DeepMind this week.
An experiment which is called Dream Track can generate 30-second music tracks for YouTube shorts from text descriptions. It’s powered by 9 artists who have partnered with Google and it’s available to a small select group of US based creators for now.
So far so good, but the really interesting part for me is the demo of their AI Music Tools. This tech apparently can create entire tracks from only a simple hum. Imagine you’re hanging out with your friends and you all can create high-fidelity music together by pitching different hums. Can’t wait until this becomes available!
Instant3D (1), One-2-3-45++, Instant3D (2) and DMV3D
Generative 3D is definitely on the rise. We not only got one new 3D object generation method, but four! Two of them even with the same name 🤪 Let’s make a quick run through them:
- Instant3D: Low quality Text-to-3D in less than one second.
- One-2-3-45++: Image-to-3D within 20 seconds, refined version in 60 seconds.
- Instant3D: High quality Text-to-3D in 20 seconds.
- DMV3D: Text-to-3D and Image-to-3D in 30 seconds.
It seems crazy to me that we still haven’t cracked hiqh-quality real-time image generation while generative 3D makes these kind of speed and quality jumps.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
What to do with that many 3D objects? Exactly, edit them! 3D Paintbrush is a technique for automatically texturing local semantic regions on meshes via text descriptions. The method is designed to operate directly on meshes, producing texture maps which seamlessly integrate into standard graphics pipelines.
D3GA: Drivable 3D Gaussian Avatars
D3GA is the first 3D controllable model for human bodies rendered with Gaussian splats in real-time. This lets us turn ourselves or others with a multi-cam setup into a Gaussian splat which can be animated, even allowing to decompose the avatar into its different cloth layers.
Adaptive Shells for Efficient Neural Radiance Field Rendering
Talking about Splats, NVIDIA presented a new method for efficiently rendering interactive NeRFs called Adaptive Shells this week. Like Gaussian Splats, Adaptive Shells are also able to render at real-time rates even while applying physical simulation and animation to them.
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
MCVD is a new video generation model. Compared to other models, this one doesn’t focus on high resolution outputs (yet), but has another very interesting capability: it can predict the past and the future. Well, not literally. But based on the data it gets fed, it can generate frames either from nothing or from given frames or up until them. Knowing the past and the future also gives it the ability to interpolate between frames.
Emu Video
Meta showcased a new text-to-video generation model called Emu Video which can created 4-second short videos at 512x512 resolution and 16fps. But it doesn’t directly transform text to video. It first generates the image and then the image into a video for better results. According to their study, Emu wins against current state of the art models like Gen-2 and Pika Labs. No real way to test this though.
Stable Diffusion 1.6
Stable Diffusion 1.6 is here, partially! Compared to 1.5, 1.6 is designed to be more cost-effective. It now supports aspect from 320px to 1536px on either side, and has been optimized to provide higher quality 512px generations. The weights haven’t been released yet though. But you can play with the model within Stability’s developer platform sandbox.
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
A crucial aspect for story telling, asset design, advertising, and more is the ability to generate consistent characters. So far this has been a pain for text-to-image models. The Chosen One is a new method that aims to make this easier, fully-automated. The method works for different character types, styles, can change the age of a character and can be used for illustrating stories.
InterpAny-Clearer: Clearer Frames, Anytime – Resolving Velocity Ambiguity in Video Frame Interpolation
InterpAny-Clearer is a new video frame interpolation method that is able to generate clearer and sharper frames compared to existing methods. Additionally, it introduces the ability to manipulate the interpolation of objects in a video independently, which could be useful for video editing tasks.
More papers & gems
- Music ControlNet: Multiple Time-varying Controls for Music Generation
- Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
- Human-SGD: Single-Image 3D Human Digitization with Shape-Guided Diffusion
I got access to Krea’s LCM interface last night and did some quick doodling. Behold my epic mouse drawing skills 😂🤌
@ai_s_a_m created an AI short in the style of video tape footage which gives off an eerily realistic vibe. It’s still far from perfect, but nevertheless, incredible.
@tldraw is currently going nuts with his own whiteboard drawing tool, combining it with GPT-4V to turn all kinds of mockups into working code snippets. Super dope!
@8bit_e has been experimenting with 3D models in Blender and used ComfyUI, StableDiffusion and AnimateDiff to transform it to the next level.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Draw My UI is a prototype built on top of tldraw which lets you draw a UI, Game, Website and more and then generates the code for it using GPT-4V. Still early and experimental, but the future of frontend dev is now!
With those of you who haven’t gotten KreaAI access yet, there is an open-source alternative for Windows which lets you define a capture area on your screen to convert with an LCM model.
EmotiVoice is a open-source text-to-speech engine that speaks English and Chinese with over 2000 different voices. It supports emotional synthesis, allowing you to create speech with a wide range of emotions.
A free book about the state of AI and ML open source projects, focusing on Large Language Models.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buy a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa