AI Art Weekly #52
Hello there, my fellow dreamers, and welcome to issue #52 of AI Art Weekly! 👋
Another crazy week in AI lies behind us: ChatGPT goes multi-modal (more below), Tesla showed us a sneak peek of their autonomous humanoid robot Optimus, Meta announced their new AI powered Ray-Ban smart glasses, and Lex Friedman had a conversation with Marc Zuckerberg in the Metaverse as photorealistic avatars 🤯
Meanwhile I’m struck with the flu, so before we get on with this weeks highlights, I let GitHub Copilot finish this intro for me: “I’m sick and tired of being sick and tired” 😅
Here are the highlights:
- GPT-4 goes multi-modal
- DreamGaussian: Efficient 3D asset generation with Generative Gaussian Splatting
- RealFill
- TempoTokens turns audio into video
- Show-1 is a new memory efficient text-to-video model
- AnimeInbet generates inbetween frames for cartoon line drawings
- and more tutorials, tools and gems!
Putting these weekly issues together takes me between 8-12 hours every Friday. With your contribution you’ll be helping me backing the evolution and expansion of AI Art Weekly for the price of a coffee each month 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for the “The Stranger”. The reward is $50 and the Challenge Winner role within our Discord community. This rare role earns you the exclusive right to cast a vote in the selection of future winners. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
GPT-4 goes multi-modal
Just last week OpenAI announced that DALL·E 3 was going to build on top of ChatGPT. This week they announced that they’ll finally add vision (and voice) capabilities. This means you’ll be able to give ChatGPT an image and interact with it. Just imagine being able to talk to your art, well it’s going to be a reality in the next two weeks. I also wonder how the vision capabilities are going to affect image generatino with DALL·E, if they nail the aspect of editing images with natural language this might be a true game changer.

I’m super stoked to see how people will use GPT-4’s vision capabilities 👀
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Generative 3D just got an upgrade. DreamGaussian is a new Gaussian Splatting method that is able to generate high-quality textured 3D meshes from text or a single image in just 2 minutes. That’s 10 times faster compared to NeRF.

DreamGaussian examples animated with Mixamo
RealFill: Reference-Driven Generation for Authentic Image Completion
Imagine you have a lot of similar photos of a memory, but none of them are perfect or show the whole picture. RealFill is able to solve that. Similar to how diffusion inpainting is working, RealFill can complete and extend an image based on similar reference images.

RealFill example
TempoTokens: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
While we’ve seen image- and video-to-audio, we haven’t seen much audio-to-video. TempoTokens is changing that. The method is able to generate videos based on an input sound. Quite impressive.

Check the video on the GitHub page for sound
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Show-1 is a new text-to-video diffusion model that is able to produce high-quality videos of precise text-video alignment. Compared to pixel only video diffusion models, Show-1 is much more efficient and only requires 15G compared to 72G of GPU memory during inference.

A panda besides the waterfall is holding a sign that says "Show Lab"
AnimeInbet
AnimeInbet is a method that is able to generate inbetween frames for cartoon line drawings. Seeing this, we’ll hopefully be blessed with higher framerate animes in the near future.

AnimeInbet example
More papers & gems
- Decaf: Monocular Deformation Capture for Face and Hand Interactions
- LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
- VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning
- IDInvert: In-Domain GAN Inversion for Real Image Editing
- CCEdit: Creative and Controllable Video Editing via Diffusion Models
With the late craze of optical illusion AI imagery, @Morph_VGart has gathered a bunch of AI artist to create the OMENS, a collection of scenes where death is implied to be imminent. I contributed with Sinkhole.
@DeveloperHarris is working on a game where all actions & dialogue for every character is fully simulated with GPT-4 in real-time.
@remi_molettee created the most impressive AnimateDiff animation I’ve seen to date. Sound on!
@KevinAFischer delivered a mind-blowing presentation on his latest work, which focuses on imbuing AI with the indescribable essence of humanity. Just imagine using this kind of tech to alter events while watching a movie. Try the demo here.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
If you want to create some of your own AnimateDiff prompt travel videos, this is the tool for it.
In case animatediff-cli-prompt-travel goes totally over your head, this guide may help you. Although it’s still very technical, so beware.
Shopify published an HuggingFace space which lets you replace the background of an image using Stable Diffusion XL. There is also a Google Colab by @camenduru.
Although I’ve been mentioning Pika Labs a lot of times already, I’ve noticed that never actually linked them in here. Seeing that they’ve been improving a lot and adding new features like embedding text and images into videos, it was about time!
If you still want to experiment with creating optical illusions, the Illusion Diffusion HuggingFace space by @angrypenguinPNG is a great place to start.

a stranger standing in an endless dimly lit curved tunnel, we are a generation of strangers with strange pictures, in the style of surreal imaginative photography, genre defining photography by James Welling --style raw --c 10
by me
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buy a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa