AI Art Weekly #100
Hello there, my fellow dreamers, and welcome to issue #100 of AI Art Weekly! 🎉🎉🎉
Hard to believe that we made it to 100 issues 🤯
What began as minor updates about Midjourney and Stable Diffusion grew much larger. It’s now a major project. I spend 8-12 hours weekly reviewing papers. This week alone, I looked through 273 papers. I select the most interesting ones to share with readers.
But with the sheer amount of research being published every week, it’s hard to keep track of all these interesting papers. So today, I’m launching AI Art Weekly Premium.
We’ve seen more and more papers get delayed code releases and this will make it possible to bookmark papers directly from the newsletter and get code alerts once the code is released without having to check each paper individually.
Thank you everyone for your continued support and I hope you enjoy this week’s issue!
Unlock the full potential of AI-generated art with my curated collection of Midjourney SREF codes and prompts.
Cover Challenge 🎨
For the next cover I’m looking for submissions about everything artificial! Reward is a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Adobe MAX 2024
Adobe unveiled significant AI-powered updates to their software suite this week at Adobe MAX 2024. Some of the coolest are:
- A new Adobe Firefly video model which can generate videos and visual effects from text and images
- Photoshop: Enhanced Distraction Removal and new Generate Similar feature (basically Midjourney variations)
- Premiere Pro: Generative Extend for seamless frame addition using the new Adobe Firefly Video Model
- Illustrator: Ability to rotate vectors in 3D space
- Project Neo: Web-based 3D editor which can be converted into 2D vectors or used for image-to-image transformations
3D
GS^3: Efficient Relighting with Triple Gaussian Splatting
GS^3 can relight scenes in real-time using a triple Gaussian splatting process. It achieves high-quality lighting and view synthesis from multiple images, running at 90 fps on a single GPU.
SceneCraft: Layout-Guided 3D Scene Generation
SceneCraft can generate detailed indoor 3D scenes from user layouts and text descriptions. It is able to turn 3D layouts into 2D maps, producing complex spaces with diverse textures and realistic visuals.
Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats
Long-LRM can reconstruct large 3D scenes from up to 32 input images at 960x540 resolution in just 1.3 seconds on a single A100 80G GPU.
ControlMM: Controllable Masked Motion Generation
ControlMM can generate high-quality motion in real-time by using spatial control signals in a motion model. It is 20 times faster than other methods and can control body parts, timelines, and avoid obstacles.
InterMask: 3D Human Interaction Generation via Collaborative Masked Modelling
InterMask can generate high-quality 3D human interactions from text descriptions. It captures complex movements between two people while also allowing for reaction generation without changing the model.
Image
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
HART is an autoregressive transformer model that can generate high-quality 1024x1024 images from text 3x times faster than SD3-Medium.
EfficientViT: Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
EfficientViT can speed up high-resolution diffusion models by compressing data with a ratio of up to 128 while keeping good image quality. It achieves a 19.1x speed increase for inference and a 17.9x speed increase for training on ImageNet 512x512 compared to other autoencoders.
CtrLoRA can adapt a base ControlNet for image generation with just 1,000 data pairs in under one hour of training on a single GPU. It reduces learnable parameters by 90%, making it much easier to create new guidance conditions.
MambaPainter can turn images into an oil painting style by predicting over 100 brush strokes in one step.
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
SGEdit can add, remove, replace, and adjust objects in images while keeping the quality of the image consistent.
UniCon Diffusion: A Simple Approach to Unifying Diffusion-based Conditional Generation
UniCon can handle different image generation tasks using a single framework. It adapts a pretrained image diffusion model with only about 15% extra parameters and supports most base ControlNet transformations.
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
FlexGen can generate high-quality, multi-view images from a single-view image or text prompt. It lets users change unseen areas and adjust material properties like metallic and roughness, improving control over the final image.
Set AutoRegressive Modeling: Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Set AutoRegressive Modeling is an autoregressive modelling technique that supports inpainting and outpainting and can generate photo-realistic images at any resolution.
Video
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models
Tex4D can generate 4D textures for untextured mesh sequences from a text prompt. It combines 3D geometry with video diffusion models to ensure the textures are consistent across different views and frames.
Depth Any Video
Depth Any Video can generate high-resolution depth maps for videos. It uses a large dataset of 40,000 annotated clips to improve accuracy and includes a method for better depth inference across sequences of up to 150 frames.
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Hallo2 can create long, high-resolution (4K) animations of portrait images driven by audio. It allows users to adjust facial expressions with text labels, improving control and reducing issues like appearance drift and temporal artifacts.
GAGAvatar: Generalizable and Animatable Gaussian Head Avatar
GAGAvatar can create 3D head avatars from a single image and enable real-time facial expression reenactment.
DifFRelight: Diffusion-Based Facial Performance Relighting
DifFRelight can change flat-lit facial captures into high-quality images and dynamic sequences with complex lighting. It uses a diffusion-based model for precise lighting control, accurately showing effects like eye reflections and skin texture.
Progressive Autoregressive Video Diffusion Models
PA-VDM can generate high-quality videos up to 1 minute long at 24 frames per second.
Audio
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS can generate natural-sounding speech using a fast text-to-speech system. It supports multiple languages, can switch between languages smoothly, and is trained on a large dataset of 100,000 hours.
Also interesting
This HuggingFace space can replace the background in a video with another video, color, or an image.
@MartinNebelong is probably the GOAT of transforming custom built 3D scenes with Generative AI. Just imagine what’s possible when we can put a real-time diffusion filter onto EVERYTHING.
@Starhand_io made a pretty song with extremely messed up visuals. Enjoy.
@vibeke_udart made a music clip with weird hand faces singing. I like weird.
@doganuraldesign created a stunning Sci-Fi trailer using Midjourney and Hailuo AI.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying a print of my art from my art shop. You can request any of my artworks to be printed, just reply to this email.
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa