AI Art Weekly #100

Hello there, my fellow dreamers, and welcome to issue #100 of AI Art Weekly! 🎉🎉🎉

Hard to believe that we made it to 100 issues 🤯

What began as minor updates about Midjourney and Stable Diffusion grew much larger. It’s now a major project. I spend 8-12 hours weekly reviewing papers. This week alone, I looked through 273 papers. I select the most interesting ones to share with readers.

But with the sheer amount of research being published every week, it’s hard to keep track of all these interesting papers. So today, I’m launching AI Art Weekly Premium.

We’ve seen more and more papers get delayed code releases and this will make it possible to bookmark papers directly from the newsletter and get code alerts once the code is released without having to check each paper individually.

Thank you everyone for your continued support and I hope you enjoy this week’s issue!


Cover Challenge 🎨

AI Art Weekly Cover 100 with 126 reimagined AI artist PFPs. Full grid here.


News & Papers

Highlights

Adobe MAX 2024

Adobe unveiled significant AI-powered updates to their software suite this week at Adobe MAX 2024. Some of the coolest are:

  • A new Adobe Firefly video model which can generate videos and visual effects from text and images
  • Photoshop: Enhanced Distraction Removal and new Generate Similar feature (basically Midjourney variations)
  • Premiere Pro: Generative Extend for seamless frame addition using the new Adobe Firefly Video Model
  • Illustrator: Ability to rotate vectors in 3D space
  • Project Neo: Web-based 3D editor which can be converted into 2D vectors or used for image-to-image transformations

Adobe Firefly video example

3D

GS^3: Efficient Relighting with Triple Gaussian Splatting

GS^3 can relight scenes in real-time using a triple Gaussian splatting process. It achieves high-quality lighting and view synthesis from multiple images, running at 90 fps on a single GPU.

GS^3 example

SceneCraft: Layout-Guided 3D Scene Generation

SceneCraft can generate detailed indoor 3D scenes from user layouts and text descriptions. It is able to turn 3D layouts into 2D maps, producing complex spaces with diverse textures and realistic visuals.

SceneCraft example

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.

Long-LRM example

ControlMM: Controllable Masked Motion Generation

ControlMM can generate high-quality motion in real-time by using spatial control signals in a motion model. It is 20 times faster than other methods and can control body parts, timelines, and avoid obstacles.

ControlMM examples

InterMask: 3D Human Interaction Generation via Collaborative Masked Modelling

InterMask can generate high-quality 3D human interactions from text descriptions. It captures complex movements between two people while also allowing for reaction generation without changing the model.

InterMask example

Image

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

HART is an autoregressive transformer model that can generate high-quality 1024x1024 images from text 3x times faster than SD3-Medium.

HART example

EfficientViT: Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

EfficientViT can speed up high-resolution diffusion models by compressing data with a ratio of up to 128 while keeping good image quality. It achieves a 19.1x speed increase for inference and a 17.9x speed increase for training on ImageNet 512x512 compared to other autoencoders.

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models example

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

CtrLoRA can adapt a base ControlNet for image generation with just 1,000 data pairs in under one hour of training on a single GPU. It reduces learnable parameters by 90%, making it much easier to create new guidance conditions.

CtrLoRA example

MambaPainter: Neural Stroke-Based Rendering in a Single Step

MambaPainter can turn images into an oil painting style by predicting over 100 brush strokes in one step.

MambaPainter example

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing

SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.

SGEdit example

UniCon Diffusion: A Simple Approach to Unifying Diffusion-based Conditional Generation

UniCon can handle different image generation tasks using a single framework. It adapts a pretrained image diffusion model with only about 15% extra parameters and supports most base ControlNet transformations.

A Simple Approach to Unifying Diffusion-based Conditional Generation example

FlexGen: Flexible Multi-View Generation from Text and Image Inputs

FlexGen can generate high-quality, multi-view images from a single-view image or text prompt. It lets users change unseen areas and adjust material properties like metallic and roughness, improving control over the final image.

FlexGen examples

Set AutoRegressive Modeling: Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling

Set AutoRegressive Modeling is an autoregressive modelling technique that supports inpainting and outpainting and can generate photo-realistic images at any resolution.

SAR inpainting example

Video

Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models

Tex4D can generate 4D textures for untextured mesh sequences from a text prompt. It combines 3D geometry with video diffusion models to ensure the textures are consistent across different views and frames.

Tex4D example

Depth Any Video

Depth Any Video can generate high-resolution depth maps for videos. It uses a large dataset of 40,000 annotated clips to improve accuracy and includes a method for better depth inference across sequences of up to 150 frames.

Depth Any Video example

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

Hallo2 can create long, high-resolution (4K) animations of portrait images driven by audio. It allows users to adjust facial expressions with text labels, improving control and reducing issues like appearance drift and temporal artifacts.

Hallo2 example

GAGAvatar: Generalizable and Animatable Gaussian Head Avatar

GAGAvatar can create 3D head avatars from a single image and enable real-time facial expression reenactment.

Generalizable and Animatable Gaussian Head Avatar example

DifFRelight: Diffusion-Based Facial Performance Relighting

DifFRelight can change flat-lit facial captures into high-quality images and dynamic sequences with complex lighting. It uses a diffusion-based model for precise lighting control, accurately showing effects like eye reflections and skin texture.

DifFRelight example

Progressive Autoregressive Video Diffusion Models

PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.

PA-VDM example. Checkout project page for 60 second examples.

Audio

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

F5-TTS can generate natural-sounding speech using a fast text-to-speech system. It supports multiple languages, can switch between languages smoothly, and is trained on a large dataset of 100,000 hours.

F5-TTS architecture

Also interesting

“COSMIC SPHERES OF BEING HUMAN” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying a print of my art from my art shop. You can request any of my artworks to be printed, just reply to this email.

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa