AI Art Weekly #54

Hello there, my fellow dreamers, and welcome to issue #54 of AI Art Weekly! 👋

Before we get into another week of cool AI advancements, I wanted to quickly thank Looking Glass for sponsoring another issue. The support tremendously helps 🙏

If you don’t own one of their hologram displays yet, check them out, they’re worth it and gave me the final push to dive into the world of 3D creation 🔥

With that said, let’s jump in! The highlights of the week are:

  • Adobe released new Firefly image and vector models
  • ScaleCrafter can generate 4k resolution images and 2k resolution videos
  • Uni-paint can inpaint subjects from images
  • HyperHuman generates hyper-realistic human images
  • 4D Gaussian Splatting turns videos into dynamic scenes
  • OmniControl lets you control joints for Human Motion generations
  • FLATTEN is yet another video editing method
  • Interview with philosopher and AI artist Pixlosopher
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: wonderland
101 submissions by 57 artists
AI Art Weekly Cover Art Challenge wonderland submission by DonaTimani
🏆 1st: @DonaTimani
AI Art Weekly Cover Art Challenge wonderland submission by EternalSunrise7
🥈 2nd: @EternalSunrise7
AI Art Weekly Cover Art Challenge wonderland submission by MezBreezeDesign
🥉 3rd: @MezBreezeDesign
AI Art Weekly Cover Art Challenge wonderland submission by OakOrobic
🥉 3rd: @OakOrobic

News & Papers

Firefly Image Model 2, Firefly Vector

Adobe held its annual Adobe MAX conference this week at which they announced the next generation of its Firefly AI models which introduced new capabilities into Adobe’s product line-up. Here’s a quick summary of all the new AI-powered features:

  • Adobe Firefly Image 2 Model: Enhanced image generation with improved control and quality. I tested it, and unfortunately not as good with interpreting text as DALL·E 3 but the quality has jumped quite a bit in comparison to the first version. You can try out the image model here.
  • Generative Match: The new image model is able to generate images with a consistent style by referencing a specific image, aiding in content scaling.
  • Text to Vector Graphic: The Firefly Vector model is able to generate editable vector graphics from text prompts in Adobe Illustrator.
  • Retype and Mockup: AI tools in Illustrator for font identification and real-life mockup previews.
  • Text-Based Editing: Filler word detection and removal in Adobe Premiere Pro.
  • AI-powered Roto Brush: Easier object isolation in Adobe After Effects.
  • AI-powered Lens Blur: Adds aesthetic blur effects in Adobe Lightroom.
  • Generative AI tools in Adobe Stock: Text to Image and Expand Image features for transforming text prompts and extending images.
  • Adobe Express enhancements: Generative Fill and Text to Template features powered by the new Firefly Design Model.

cinematic shot of a t-rex in the jungle made out of wool generated with the new Firefly Image 2 Model

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

High-resolution diffusion models are coming! ScaleCrafter is a new approach that can generate images with resolution of 4096x4096 based on pre-trained diffusion models, which is 16 times higher than the original training resolution. The method is also able to generate images with arbitrary aspect ratios and addresses persistent problems of object repetition and unreasonable object structures when generating images outside of a model’s training resolution. The best part? It doesn’t require any training or optimization.

Generated video from the prompt A beautiful girl on a boat with a resolution of 2048 x 1152 🔥

Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model

Inpainting subjects into other images accurately has been something I’ve been looking for a while now. Uni-paint seems to be able to solve that. It’s a new method for multimodal inpainting that offers various modes of guidance, including unconditional, text-driven, stroke-driven, exemplar-driven inpainting, as well as a combination of these modes. Like ScaleCrafter, Uni-paint is based on pretrained Stable Diffusion models and does not require task-specific training on specific datasets, which lets it work well with custom images using just a few examples.

Uni-paint examples

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

HyperHuman is a new text-to-image model that focuses on generating hyper-realistic human images from text prompts and a pose image. The results are pretty impressive and the model is able to generate images in different styles and up to a resolution of 1024x1024.

HyperHuman example

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

After 3D Gaussian Splatting comes 4D Gaussian Splatting. 4D-GS is a method that can turn videos into dynamic scenes within 20 minutes and reach the rendering speed of over 50 fps on an RTX 3080. Meaning it’s able to achieve real-time rendering for dynamic scenes with high image resolutions while maintaining high rendering quality.

4D-GS example

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Haven’t seen much from the human motion department lately, but there has been progress! OmniControl is able to generate realistic human motions from a text prompt and flexible spatial control signals. The interesting part is that it can control any joint at any time with only one model. This means that you can for example generate a motion where a person is playing the violin with their left hand in the air and their right hand holding the bow.

a person plays a violin with their left hand in the air and their right hand holding the bow

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

One of the most important aspect of generative AI is control. MotionDirector is a method that can train text-to-video diffusion models to generate videos with the desired motions from a reference video. And from the look of it, this seems to work extremely well!

MotionDirector examples

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

FLATTEN is yet another method that tries to improve the visual consistency of text-to-video editing. FLATTEN is training-free and can be seamlessly integrated into any diffusion-based text-to-video editing methods and improve their visual consistency. The results look quite nice.

FLATTEN example

More papers & gems

  • Drivable Avatar Clothing: Faithful Full-Body Telepresence with Dynamic Clothing Driven by Sparse RGB-D Input
  • Mini-DALL•E 3: Interactive Text to Image by Prompting Large Language Models
  • AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
  • Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes
  • SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing


This week I had the pleasure to interview Pixlosopher. A philosopher turned artist, who blends his philosophical leanings with AI’s boundless potential, creating beautiful pixelated and non-pixelated artworks.

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

a series of chinese dolls are standing in a line, in the style of vray tracing, bubble goth, brushwork exploration, kanō school, frontal perspective, zigzags, detailed crowd scenes --ar 3:2 --c 15 --niji 5 --style expressive created with the final render from my Blender explorations.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buy a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa