AI Art Weekly #83

Hello there, my fellow dreamers, and welcome to issue #83 of AI Art Weekly! 👋

While strolling through the Turia River park this week, OpenAI and Google were propping up ChatGPT and Gemini to become our personal AI assistants, tutors, and even companions for everyday life. Looking at the insane capabilities these models are gaining, it’s hard to not feel dread about the future of work.

But it’s also hard to not feel excited about the possibilities they bring. Surrounded by the beauty of the park, I imagined an utopian world where bullshit jobs are a thing of the past. In that moment, the dream felt real — I was creating art, doing what I love, and feeling truly happy. Although it still seems distant, this future may be closer than we think.

In this issue:

  • GPT-4o is more than a virtual voice assistant
  • 3D: CAT3D, Dual3D, Coin3D, Toon3D, LayGA
  • Image: T2V-NPR, Analogist, LogoMotion, BlobGEN, SOEDiff
  • and more!

Cover Challenge 🎨

Theme: red
192 submissions by 126 artists
AI Art Weekly Cover Art Challenge red submission by daidatep
🏆 1st: @daidatep
AI Art Weekly Cover Art Challenge red submission by coralinesaidso
🥈 2nd: @coralinesaidso
AI Art Weekly Cover Art Challenge red submission by EternalSunrise7
🥈 2nd: @EternalSunrise7
AI Art Weekly Cover Art Challenge red submission by KulovaMax
🧡 4th: @KulovaMax

News & Papers


Besides becoming a virtual voice assistant replica of Samantha from the movie Her, GPT-4o has also some other remarkable capabilities almost no one talks about. Let’s check them out!

Visual narratives

GPT-4o can generate visually coherent stories either from only text prompts or from a combination of text and images.

“Robot writer’s block” generated from text prompts only

Image creation and editing

It is also able to generate and edit design work based on a combination of text and images. For instance movie posters.

Movie poster created from text and two head shots

Poetic typography

Its abilities to generate text are mind blowing.

A neat handwritten illustrated poem with text that is big and legible. The handwriting writing is sparsely but elegantly decorated by small colorful surrealist doodles. The text is large, legible and clear.

Image stylization

And it can stylize existing images.

Stylized caricature portrait

3D reconstruction

It can generate multi-view images and turn them into 3D objects.

3D reconstruction from 6 generated images

Image inpainting

And inpaint logos onto images.

Inpainting the OpenAI logo onto a coaster

And this is just one small subset of the things it can do. You can check out more of these capabilities on the announcement blog post.


CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Besides Gemini, Google presented CAT3D this week. It can turn any number of images into a 3D scene. The resulting scenes can be rendered interactively and the total processing time, including both view generation and 3D reconstruction, runs in as little as one minute.

CAT3D examples

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Dual3D is yet another text-to-3D method that can generate high-quality 3D assets from text prompts in only 1 minute.

A compositional scene rendered with Blender with all visible 3D assets generated by Dual3D

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Coin3D can generate and edit 3D assets from a basic input shape. Similar to ControlNet, this enables precise part editing and responsive 3D object previewing within a few seconds.

Coin3D example

Toon3D: Seeing Cartoons from a New Perspective

Toon3D can generate 3D scenes from two or more cartoon drawings. It’s far from perfect, but still pretty cool!

Toon3D example

LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer

LayGA is a new Gaussian Avatar representation that can separate body and clothes into separate layers from multiview videos and can transfer garments to different bodies.

LagYA example


T2V-NPR: Text-to-Vector Generation with Neural Path Representation

T2V-NPR can generate vector graphics from text or images. The method is also able to optimize the generated SVGs with adjustable levels of details and different styles, as well as animate them based on a text prompt describing the desired motion.

T2V-NPR examples

Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

Instead of training a model for each specific task like deblurring or colorization, Analogist uses a flexible in-context learning approach with pre-trained diffusion models. This method only needs a few example pairs to handle various visual tasks, including denoising, low-light enhancement, image translation, style transfer, motion transfer, pose transfer, inpainting, and more.

Analogist examples

LogoMotion: Visually-Grounded Code Generation for Content Aware Animation

LogoMotion can turn logos from layered PDF files into content-aware animated HTML canvas animations. Very cool!

LogoMotion examples

BlobGEN: Compositional Text-to-Image Generation with Dense Blob Representations

BlobGEN is a new text-to-image model by NVIDIA that is able to generate images based on blob representations. These blob representations can be automatically extracted from a scene and then be used to guide the image generation process.

BlobGEN examples

SOEDiff: Efficient Distillation for Small Object Editing

You ever tried to inpaint smaller objects and details into an image? Can be kind of a hit or miss. SOEDiff has been specifically trained to handle these cases and can do a pretty good job at it.

SOEDiff comparisons

Also interesting

  • Text Scene Motion: Generating Human Motion in 3D Scenes from Text Descriptions
  • FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

“We Hope” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa