AI Art Weekly #53
Hello there, my fellow dreamers, and welcome to issue #53 of AI Art Weekly! 👋
I have another chock-full issue for you this week and an exciting surprise for this weeks cover challenge (more below). Let’s dive right into this weeks highlights:
- DALL·E 3 and GPT-4V available for free on Bing
- DREAM preprocesses your brainwaves into depth maps
- Ground-A-Video enables zero-shot video editing
- LLM-grounded Video Diffusion Models
- HumanNorm generates realistic 3D humans
- PIXART-α: Training a foundation Text-to-Image model for a fraction of the cost
- DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animations
- Image restoration with DA-CLIP
- Text-To-GIF model Hotshot XL
- Interview with “Strange History” curator Historic Crypto
- and more tutorials, tools and gems!
Putting these weekly issues together takes me between 8-12 hours every Friday. With your contribution you’ll be helping me backing the evolution and expansion of AI Art Weekly for the price of a coffee each month 🙏
Cover Challenge 🎨
Looking Glass is sponsoring this weeks challenge and the winner will receive a Looking Glass Portrait (besides the usual $50). I bought one for myself last year and they’re super cool. If you want to get one for yourself regardless of the challenge, use my affiliate link to get 10% off.
For next weeks cover I’m looking for “wonderland” inspired images. Besides the usual $50, the winner also gets a Looking Glass Portrait, proudly sponsored by Looking Glass themselves! Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
DALL·E 3 and GPT-4V available for free on Bing
Microsoft quietly rolled out DALL·E 3 and GPT-4V into Bing last week, and it’s available for free (for now). DALL·E 3 can also be used separately through Bing Creator, so naturally I had to give it a try, and I’m positively surprised by its ability to understand natural text and generate readable text. While image quality isn’t comparable with Midjourney, it has a more unrefined output which I appreciate and prefer to Midjourney’s nowadays extremely clean results.

An image I made this week with DALL·E 3 through Bing Creator
DREAM: Visual Decoding from REversing HumAn Visual SysteM
We’re getting closer to visualize our dreams. DREAM is an fMRI-to-image method for reconstructing viewed images from brain activities. It’s basically a preprocessor that is able to convert your brainwaves into semantics, color, and depth maps for ControlNet.

DREAM examples
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
One key feature that I feel is missing from open-source AI video, is a good video-to-video option that enables video editing similar to Gen-1. Ground-A-Video is the latest addition to that family. The method allows you to edit multiple attributes of a video via Stable Diffusion and spatially-continuous & -discrete conditions, without any training. Unfortunately, as with most of methods in this category, there is no actual source code to use it 😒

Ground-A-Video examples
LLM-grounded Video Diffusion Models
LLM-grounded Video Diffusion Models (LVD) is a new method that improves text-to-video generation by using a large language model to generate dynamic scene layouts from text and then guiding video diffusion models with these layouts, achieving realistic video generation that align with complex input prompts. Unforutnately, there is no actual video demo yet, so we’ve to wait to see how final results actually look like.

LLM-grounded Video Diffusion Model frame comparisons
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
HumanNorm is a novel approach for high-quality and realistic 3D human generation by leveraging normal maps which enhances the 2D perception of 3D geometry. The results are quite impressive and comparable with PS3 games.

HumanNorm examples
PIXART-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PIXART-α is a new text-to-image model that is able to generate images with a resolution of up to 1024px and only required a training time of roughly 10% compared to that of Stable Diffusion 1.5 (~675 vs ~6’250 A100 GPU days). This is obviously much more cheaper as well ($26k compared to $320k). The model is also able to generate images with a high level of control and can be combined with Dreambooth to generate images of concepts that weren’t included in the original training.

A small cactus with a happy face in the Sahara desert.
by PIXART-α
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
DiffPoseTalk is a new method for generating stylistic 3D facial animations driven by speech and head pose. The method is based on diffusion models and a style encoder that extracts style embeddings from short reference videos. The results look pretty good and the method outperforms existing ones like SadTalker. No code yet unfortunately.

DiffPoseTalk example
DA-CLIP: Controlling Vision-Language Models for Universal Image Restoration
DA-CLIP is a new method that can be used to restore images. Apart from inpainting, the method is able to restore images by dehazing, deblurring, denoising, derainining and desnowing them as well as removing unwanted shadows and raindrops or enhance lighting on low-light images.

DA-CLIP examples
More papers & gems
- TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
- HGHOI: Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
- LEAP: Liberate Sparse-view 3D Modeling from Camera Poses
- Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models
- SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D
- GETAvatar: Generative Textured Meshes for Animatable Human Avatars
“Strange History: Conquest” is a collection of AI videos depicting artistic interpretations of epic battles that may or may not have shaped the course of history.
@NathanBoey is working on a new series called ‘About Time’. If the quality is comparable to this first piece, then I can’t wait to see more!
Another mesmerizing AI video by @BengtTibert made with Midjourney and Pika Labs.
Interview
In this latest AI Art Weekly interview I’m talking to @Historic_Crypto, the founder behind “Strange History”, a phenomena that started as a curated collection turned collective that couldn’t be less bothered about the current market and is constantly churning out new creative ways to re-imagine the past with the help of AI. Enjoy!
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. Pretty fun to play around and you can give it a try for free at Hotshot AI. Code and model are available on GitHub and HuggingFace.
Moneyvalley is yet another text-to-image model, creating cinematic and animated videos from simple text prompts. It’s available through Moonvalley’s Discord.
This video is a quick overview of adding IPAdapters and LoRAs into your AinmateDiff CLI workflow.
@cfryant shared a step by step tutorial on how to create animations by generating sprite sheets with DALL·E 3.

Another piece I created this week using DALL·E 3 through Bing Creator.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buy a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa