Hello there, my fellow dreamers, and welcome to issue #52 of AI Art Weekly! 👋

Another crazy week in AI lies behind us: ChatGPT goes multi-modal (more below), Tesla showed us a sneak peek of their autonomous humanoid robot Optimus, Meta announced their new AI powered Ray-Ban smart glasses, and Lex Friedman had a conversation with Marc Zuckerberg in the Metaverse as photorealistic avatars 🤯

Meanwhile I’m struck with the flu, so before we get on with this weeks highlights, I let GitHub Copilot finish this intro for me: “I’m sick and tired of being sick and tired” 😅

Here are the highlights:

  • GPT-4 goes multi-modal
  • DreamGaussian: Efficient 3D asset generation with Generative Gaussian Splatting
  • RealFill
  • TempoTokens turns audio into video
  • Show-1 is a new memory efficient text-to-video model
  • AnimeInbet generates inbetween frames for cartoon line drawings
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: subliminal
98 submissions by 60 artists
AI Art Weekly Cover Art Challenge subliminal submission by ItsBB7
🏆 1st: @ItsBB7
AI Art Weekly Cover Art Challenge subliminal submission by mind_wank
🥈 2nd: @mind_wank
AI Art Weekly Cover Art Challenge subliminal submission by onchainsherpa
🥉 3rd: @onchainsherpa
AI Art Weekly Cover Art Challenge subliminal submission by ManoelKhan
🥉 3rd: @ManoelKhan

News & Papers

GPT-4 goes multi-modal

Just last week OpenAI announced that DALL·E 3 was going to build on top of ChatGPT. This week they announced that they’ll finally add vision (and voice) capabilities. This means you’ll be able to give ChatGPT an image and interact with it. Just imagine being able to talk to your art, well it’s going to be a reality in the next two weeks. I also wonder how the vision capabilities are going to affect image generatino with DALL·E, if they nail the aspect of editing images with natural language this might be a true game changer.

I’m super stoked to see how people will use GPT-4’s vision capabilities 👀

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Generative 3D just got an upgrade. DreamGaussian is a new Gaussian Splatting method that is able to generate high-quality textured 3D meshes from text or a single image in just 2 minutes. That’s 10 times faster compared to NeRF.

DreamGaussian examples animated with Mixamo

RealFill: Reference-Driven Generation for Authentic Image Completion

Imagine you have a lot of similar photos of a memory, but none of them are perfect or show the whole picture. RealFill is able to solve that. Similar to how diffusion inpainting is working, RealFill can complete and extend an image based on similar reference images.

RealFill example

TempoTokens: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

While we’ve seen image- and video-to-audio, we haven’t seen much audio-to-video. TempoTokens is changing that. The method is able to generate videos based on an input sound. Quite impressive.

Check the video on the GitHub page for sound

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Show-1 is a new text-to-video diffusion model that is able to produce high-quality videos of precise text-video alignment. Compared to pixel only video diffusion models, Show-1 is much more efficient and only requires 15G compared to 72G of GPU memory during inference.

A panda besides the waterfall is holding a sign that says "Show Lab"


AnimeInbet is a method that is able to generate inbetween frames for cartoon line drawings. Seeing this, we’ll hopefully be blessed with higher framerate animes in the near future.

AnimeInbet example

More papers & gems

  • Decaf: Monocular Deformation Capture for Face and Hand Interactions
  • LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
  • VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning
  • IDInvert: In-Domain GAN Inversion for Real Image Editing
  • CCEdit: Creative and Controllable Video Editing via Diffusion Models

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

a stranger standing in an endless dimly lit curved tunnel, we are a generation of strangers with strange pictures, in the style of surreal imaginative photography, genre defining photography by James Welling --style raw --c 10 by me

