Weekly cover challenges are under threat 😱 Help me keep them alive!
Become a supporter

AI Art Weekly #28

Hello there my fellow dreamers and welcome to issue #28 of AI Art Weekly! 👋

I’ve started developing a game using GPT-4 and GitHub Copilot this week and it’s been a lot of fun and admittedly, excuse my French, a bit of a mind fuck. So strange to type in code and see an autocomplete suggestion that matches the intend you had. Take a look at “Room of Wonders” below to see the current stage of the project. Anyhow, here are this weeks highlights:

  • VideoCrafter: A new open-source text-to-video model
  • Midjourney released Niji v5 and a new /describe feature
  • Interview with AI artist and deforum developer Huemin
  • Breadboard: A client-side AI image file browser
  • StyleGAN-T training code released

Cover Challenge 🎨

Theme: pope art
69 submissions by 43 artists
AI Art Weekly Cover Art Challenge pope art submission by EternalSunrise7
🏆 1st: @EternalSunrise7
AI Art Weekly Cover Art Challenge pope art submission by VikitoruFelipe
🥈 2nd: @VikitoruFelipe
AI Art Weekly Cover Art Challenge pope art submission by CrazyPepeEth
🥉 3rd: @CrazyPepeEth
AI Art Weekly Cover Art Challenge pope art submission by CosmicCamera
🧡 4th: @CosmicCamera

Reflection: News & Gems

VideoCrafter:A Toolkit for Text-to-Video Generation and Editing

A new 1.2B parameter text to video model called VideoCrafter got released this week. The new model comes with three different features and apparently achieves higher quality compared to the Modelscope model. Aside from the base text-to-video pipeline, the toolkit also supports LoRA fine-tuning for training the model on new videos, as well as a control feature similar to ControlNet. This feature allows you to use depth maps from other videos to guide the video generation process for instance. Check out the Google Colab, HuggingFace space. The Automatic1111 text2video extension also already supports the model – although it has been reported to be a bit buggy at the moment.

VideoCrafter features

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

Similar to VideoCrafter’s control feature, Follow Your Pose introduces pose-guidance for what appears to be the Tune-A-Video model, allowing the generation of character-focused videos from text. Text-to-Video is far from perfect at this stage, but ControlNet was a huge leap forward for introducing more controllability to text-to-image generation – so I’m excited to see we’re able to do something similar with video. For now, one must animate the posed rigs themselves. However, I can see how this would be particularly useful when combined with a human motion diffusion model like ReMoDiffuse, which generates the pose-rigs.

“The Stormtroopers, on the beach” – Follow Your Pose example

Kandinsky 2.1

There is a new text-to-image model in town called Kandinsky. Kandinsky 2.1 adopts the most effective strategies from Dall-E 2 and Latent Diffusion, while also incorporating novel concepts. Similar to Midjourney, the fuse method lets you combine images similar to Midjourney’s /blend command or StableDiffusion’s ImageMIxer. Check out the HuggingFace demo or the text2img or mixing Google Colab notebooks if you want to try it out.

Kandinsky 2.1 fuse example

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

We haven’t seen a lot of audio/music models since that big wave back in February. So AUDIT is a welcome change of pace. The AUDIT model offers instruction-guided editing, which allows one only to provide edit instructions rather than a full target audio description, making it more practical for real-world scenarios. This results results are promising in various audio editing tasks, such as adding effects, replacing instruments, and repairing damaged audio. Check out the project page for examples.

AUDIT consists of a VAE, a T5 text encoder, and a diffusion network, and accepts the mel-spectrogram of the input audio and the edit instructions as conditional inputs and generates the edited audio as output.

GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models

I’m always intrigued by models that are generating things “out of thin air”. Text-to-image is one thing, but providing an image for let’s say a room, and then imagining what the other side could look like when turning the camera by a 180º just gives me that certain kind of exploration feeling. GeNVS does that. It takes a single input image and can generate novel views from it, either by rotating around an object or by “walking down” the hallway of a room. Eerie.

GeNVS example

Midjourney Updates

Midjourney announced a few updates this week:

  • Niji v5 is now available. You can use it by prompting --niji 5 when generating an image. There will be an additional 3 styles available until the end of April.
  • The new /describe command lets you upload an image and get a description of it. @DrJimFan had an interesting take on how MJ could use this for reinforcment learning from human feedback (RLHF) to further improve their models.
  • A new --repeat feature and permutations syntax. For example /imagine cats --repeat 5 will create five 2x2 grids of cats. Or /imagine {cyberpunk, vaporwave} will queue two jobs (/imagine cyberpunk and /imagine vaporwave). @tristwolff shared some interesting use-cases on Twitter.

a group of shocked and surprised anime women with big open mouths and big eyes are reading aiartweekly –ar 3:2 –niji 5

More papers and gems

  • ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model.
  • DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model.
  • DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance.
  • DC2: Dual-Camera Defocus Control by Learning to Refocus.

Imagination: Interview & Inspiration

This week we talk to huemin, a prominent coder and digital AI artist who among others initiated the popular deforum notebook which can be used to create animation videos. I’ve extensively used the notebook myself last year to create music videos, so I’m happy to have huemin on the newsletter. Let’s jump in!

[AI Art Weekly] Huemin, what’s your background and how did you get into AI art?

I have a background in applied physics with two college degrees and work as a consultant. I’ve always been interested in art and have explored various forms, but I’m particularly passionate about 2D still images and technology. In 2021, I discovered GANs while researching generative art, and I was captivated. I invested in a 3070 graphics card to train my own models, and later stumbled upon Google Colab and the VQGAN animation notebook. Since then, I’ve devoted my free time to learning about AI art generators, creating free tools, and minting my artwork.

“paper-scape 7” by huemin

[AI Art Weekly] Do you have a specific project you’re currently working on? What is it?

I am presently working on a live mint and voice2image interactive installation for Bright Moments Tokyo, taking place from May 5th to May 10th. I’m one of 11 AI artists participating in this gallery event in Tokyo, Japan. My project focuses on generative art within latent space.

[AI Art Weekly] Can you tell me a bit more about deforum?

Deforum was initiated by myself and other prominent AI art community members before the release of Stable Diffusion (check out the interview with ScottieFox in issue #4). Our objective was to collaboratively develop a feature-rich Google Colab notebook that would allow artists to create impressive stills and animations using Stable Diffusion.

“simulacrum 004” by huemin

[AI Art Weekly] Any advice for people who want to get started building with AI?

There are two great resources for people working with AI in art and development: community groups like the deforum Discord and large language models. Community groups help you learn, share knowledge, and gain ideas about how these systems function. Large language models can help you understand concepts and improve your programming skills.

[AI Art Weekly] What does your workflow look like?

My creative process starts with moments of insight and inspiration, and I’m always on the lookout for new ideas. When I have a feasible concept, I begin prototyping and generating many images. My aim is to create a unique pipeline for image generation, integrating new tools into deforum and adopting techniques from other repositories. For my Braindrops collection, Materia Mania, I generated over 150,000 images using Stable Diffusion and an aesthetics classifier, while incorporating generative art algorithms and fine-tuning models.

“Materia Mania 0090” by huemin

[AI Art Weekly] What is your favourite prompt when creating art?

Prompts are not significant or relevant to me. I have my own artistic preferences and use all available resources to achieve my desired results. But I think it’s an interesting exercise to have a few test prompts that you throw into every model to check its outputs. My default prompt for testing a model is untitled abstract futurism.

[AI Art Weekly] How do you imagine AI (art) will be impacting society in the near future?

As AI improves, we’ll be able to generate exactly what we want in the highest quality possible. When the skill barrier for art generation is eliminated and anyone can create any image, what is left and how do we as artist express?

[AI Art Weekly] Who is your favourite artist?

I believe @RiversHaveWings is an outstanding artist who deserves more recognition. RiversHaveWings has undoubtedly had a significant impact on the direction of AI art and empowered countless individuals.

“Stone Geese” by RiversHaveWings

[AI Art Weekly] Anything else you would like to share?

I’m grateful for the chance to share this information. I strive to stay informed about the work of others in the open-source community. Many individuals dedicate their time to creating incredible, free tools that generate immense value, and they deserve recognition for their efforts.


Creation: Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

A blend of different images from the prompt untitled abstract futurism created with Midjourney V1, V2, V3, V4 and V5 by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa