AI Art Weekly #42

Hello there, my fellow dreamers, and welcome to issue #42 of AI Art Weekly! 👋

Good news. We’ve reached 25% coverage for the Twitter API fees and I’m confident that we’ll be able to cover the rest of the costs maybe not right away, but hopefully soon. Thank you so much to everyone who contributed so far. It takes some pressure of the financials and I’m deeply grateful for your support 🙏

That being said, we’ve had another interesting week behind us. I personally spend a lot of time working on my Claire Silver “motion” contest submission and there have been some new interesting papers & resources. Let’s jump in:

  • AnimateDiff: Text-to-Video with any Stable Diffusion models
  • CSD-Edit: Multi modality editing for 4k images and video
  • HyperDreamBooth: Personalize a text-to-image diffusion model 25x faster than DreamBooth
  • Animate-A-Story: Storytelling with text-to-video generation
  • PGR: Facial reenactment through a personalized generator
  • VampNet: Audio to loops and variations
  • Interview with TRUE CAMELLIA
  • Stable Doodle released
  • and more

Cover Challenge 🎨

Theme: quantum
82 submissions by 46 artists
AI Art Weekly Cover Art Challenge quantum submission by annadart_artist
🏆 1st: @annadart_artist
AI Art Weekly Cover Art Challenge quantum submission by JenPanepinto
🥈 2nd: @JenPanepinto
AI Art Weekly Cover Art Challenge quantum submission by DonaTimani
🥉 3rd: @DonaTimani
AI Art Weekly Cover Art Challenge quantum submission by wander_hooligan
🧡 4th: @wander_hooligan

News & Papers

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

AnimateDiff is a new framework that brings video generation to the Stable Diffusion pipeline. Meaning you can generate videos with any already existing Stable Diffusion models without having to fine-tune or train anything. Pretty amazing. @DigThatData put together a Google Colab notebook in case you want to give it a try.

AnimateDiff examples from different Stable Diffusion models

CSD-Edit: Collaborative Score Distillation for Consistent Visual Synthesis

CSD-Edit is a novel multi modality editing approach that compared to other methods works great on images bigger than the traditional 512x512 limitation and can edit 4k or large panorama images, has improved temporal consistency on video frames as well as improved view consistency when editing or generating 3D scenes.

CSD-Edit video editing comparison with other methods

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

The team behind DreamBooth is back with a new paper that introduces HyperDreamBooth. The new method tackles the size and speed issues of DreamBooth, while preserving model integrity, editability and subject fidelity and is able to personalize a text-to-image diffusion model 25x faster than DreamBooth with only a single input image.

HyperDreamBooth example

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

Animate-A-Story is a video storytelling approach which can synthesize high-quality, structured, and character driven videos. Composition and scene transitions are still early days, but it’s interesting to see how a first text-to-story pipeline looks like.

Animate-A-Story example

PGR: Facial Reenactment Through a Personalized Generator

After seeing PGR this week, I think it’s safe to say that we can’t trust anything anymore we see online. From a creative perspective though, down the road tech like this could potentially be used for replacing actors in movie projects. Basically you or your friends play all the roles yourself and something like PGR will reenact the actor you want for the role, whoever that may be.

PGR example

VampNet: Music Generation via Masked Acoustic Token Modeling

VampNet is a music generation model that can create loops and variations from short musical excerpts and can be fine-tuned with LoRA on custom audio datasets like playlists or specific albums.

More gems & papers

  • 3D VADER: AutoDecoding Latent 3D Diffusion Models
  • My3DGen: Building Lightweight Personalized 3D Generative Model
  • Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback
  • LSV: Efficient 3D articulated human generation with layered surface volumes
  • DATENCODER: Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
  • FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing


This week @annadart_artist and I talked to AISurreaslim artist @truecamellia.

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Whispers of memories breathe through the minimal colorful modern landscapes of 'Silent Echoes,' a twilight analogue photograph, where stark contrasts of black and vibrant whites converge, evoking haunting expressionist undertones inspired by René Magritte and Daido Moriyama, amidst contemporary political tension in a totalitarian state --s 250 --v 5.2 --style raw --ar 3:2. Prompt by @moelucio and made in our Discord.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa