AI Art Weekly #76
Hello there, my fellow dreamers, and welcome to issue #76 of AI Art Weekly! 👋
Just this morning I skimmed through a whopping 230+ papers and projects, and let me tell you, we aren’t prepared for the exponentials that are about to hit us. Robots, AI superchips, LLM powered operating systems, an explosion in 3D content, things are ramping up hard.
This issue is the biggest one yet. It’s about 4 times as big as usual. So if you can, please consider supporting this newsletter by buying me a coffee or becoming a monthly supporter.
Due to the sheer amount of content this week, I’ve split the news section into categories. In this issue we cover:
- 3D generation and texturing: 13 different methods for text and image-to-3D, InTeX, TexDreamer, GaussianFlow
- Image generation: SD3 Turbo, LightIt, OMG, YOSO, FouriScale, Desigen
- Image editing: StyleSketch, Wear-Any-Way, DiffCriticEdit, Magic Fixup, DesignEdit, ReNoise
- Video generation: AnimateDiff-Lightning, StyleCineGAN, Time Reversal, Mora, VSTAR
- Video editing: FRESCO, AnyV2V, MOTIA
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for ostara submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D generation and texturing
Text-to-3D and Image-to-3D
As I said in the intro, 3D content is about to explode. Just this week we had 13 papers on text and image-to-3D object reconstruction alone. As they’re all somewhat similar, I’m not going to dissect them all. Instead, I’ll just list them here:
- SV3D: Stability AI released a new model for high-resolution, image-to-3D reconstruction.
- LATTE3D: NVIDIA’s new text-to-3D method to generates high-quality textured meshes from text robustly in just 400ms.
- Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding.
- MVControl: Text-to-3D with ControlNet like conditioning (canny, depth, scribble, etc.).
- Make-Your-3D: Image-to-3D with the ability to control generation with a text prompt
- MVEdit: Supports text-to-3D, image-to-3D, and 3D-to-3D with texture generation.
- VFusion3D: Image-to-3D from Video Diffusion Models.
- GVGEN: Text-to-3D Generation with Volumetric Representation.
- GRM: High-quality, efficient text-to-3D and image-to-3D in 100ms
- FDGaussian: Image-to-3D with Gaussian Splatting.
- Ultraman: Image-to-3D with a focus on human avatars.
- Sculpt3D: More text-to-3D.
- ComboVerse: More image-to-3D.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/sv3d.gif)
SV3D takes an image as input and generates novel multi-view images and 3D models.
InTeX: Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting
Now that we have a gazillion options to generate 3D objects, we might want to have more control over the textures. InTeX helps with that by generating and inpainting textures from text.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/intex.gif)
InTeX inpainting example
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
And another one! TexDreamer is a high-fidelity 3D human texture generation model that supports both text and image inputs.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/texdreamer.gif)
Animated TexDreamer examples
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Image-to-3D is cool. But what about video-to-4D? GaussianFlow can generate 4D Gaussian Splatting fields from monocular videos (like Sora).
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/gaussianflow-examples.gif)
GaussianFlow examples
Image generation
Stable Diffusion 3 Turbo
Stable Diffusion 3 hasn’t even been released yet, and Stability already announced its Turbo version. This is SD3 but faster, think SDXL quality in 4 steps.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/sd3-turbo.jpg)
SD3 Turbo example: A close-up of a woman’s face, lit by the soft glow of a neon sign in a dimly lit, retro diner, hinting at a narrative of longing and nostalgia.
LightIt: Illumination Modeling and Control for Diffusion Models
Now, let’s talk image generation. LightIt is a method for explicit illumination control for image generation. It’s the first method that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/lightit-short.gif)
LightIt illumination control
OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models
OMG is a framework for multi-concept image generation, supporting character and style LoRAs. Instead of LoRAs, it also supports InstantID for multi-ID support.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/omg.jpg)
OMG multi-concept generation
YOSO: You Only Sample Once
Image models are becoming faster, bigger, better. YOSO is a new method that can finetune pretrained diffusion models to generate high-fidelity images in one-step.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/yoso.jpg)
Images generated with the YOSO-PixArt-α-1024 model
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/fouriscale.jpg)
Different FouriScale images of varying aspect ratios
Desigen: A Pipeline for Controllable Design Template Generation
Unlimited design templates unlocked. Desigen is a pipeline for automatic template creation which generates background images as well as harmonious layout elements over the background. This could be used to generate design templates for websites, presentations, social media posts and more.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/desigen.jpg)
Desigen examples
Image editing
StyleSketch: Stylized Face Sketch Extraction via Generative Prior with Limited Data
StyleSketch is a method for extracting high-resolution stylized sketches from a face image. Pretty cool!
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/stylesketch.jpg)
StyleSketch examples
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Wear-Any-Way is a new framework for virtual try-on that supports users to precisely manipulate the wearing style of garments. The method enables users to drag sleeves to roll them up, open coats, and control the style of tucks, among other things.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/waw-7.gif)
Wear-Any-Way multi-garment try-on
Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
DiffCriticEdit enables 3D manipulations on images, such as object rotation and translation.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/diffcriticedit.gif)
Rotating a chair with DiffCriticEdit
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
Adobe’s Magic Fixup lets you edit images with a cut-and-paste approach that fixes edits automatically. Can see this being super useful for generating animation frames for tools like AnimateDiff. But it’s not clear yet if or when this hits Photoshop.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/magic-fixup.gif)
Magic Fixup examples
DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing
DesignEdit is another image editing method, but from Microsoft. It can remove objects, edit typography, swap, relocate, resize, add and flip multiple objects, pan and zoom images, remove decorations from images, and edit posters.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/design-edit.jpg)
DesignEdit examples
ReNoise: Real Image Inversion Through Iterative Noising
ReNoise can be used to reconstruct an input image that can be edited using text prompts.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/renoise.gif)
ReNoise turning a monkey into a poodle
Video generation
AnimateDiff-Lightning
After SDXL Lightning, ByteDance now released AnimateDiff-Lightning. A text-to-video model that can generate videos more than ten times faster than the original AnimateDiff.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/animatediff-1.gif)
AnimateDiff-Lightning examples
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
StyleCineGAN is a method that can generate high-resolution looping cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/stylecinegan.gif)
StyleCineGAN examples
Time Reversal: Explorative Inbetweening of Time and Space
Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/time-reversal.gif)
Cinemagraph example
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Mora is an open-source attempt at replicating OpenAI’s Sora video model capabilities in various tasks such as text-to-video generation, image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds. Results are far away from Sora, but it’s a start!
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/mora.gif)
12 second long Mora text-to-video example at 1024×576 resolution
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/vstar.gif)
VSTAR examples
Video editing
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
FRESCO combines ControlNet with Ebsynth for zero-shot video translation that focuses on preserving the spatial and temporal consistency of the input frames.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/fresco.gif)
FRESCO examples
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
AnyV2V can edit a source video along with additional control (such as text prompts, subjects, or styles). Looks like one of the best Gen-1 alternatives yet.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/anyv2v.gif)
make it snowing
AnyV2V example
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
MOTIA is a high-quality flexible video outpainting method. But no code yet 😭
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/byo.gif)
MOTIA outpainting example
Also interesting
- SceneScript: an AI model and method to understand and describe 3D spaces
- Arc2Face: A Foundation Model of Human Faces
- ScoreHMR: Score-Guided Diffusion for 3D Human Recovery
- Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
- Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
- SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
A one-step image-to-image method based on SD-turbo that enables sketch2image, day2night, and more. Hugging face demo.
StreamMultiDiffusion is a real-time interactive multiple-text-to-image generation from user-assigned regional text prompts.
A free tool for generating textures via Automatic1111 StableDiffusion. Supports preserving of UV maps, blending layers by brush, 3D inpainting and more.
The code and model weights for interpolating between two images with DynamiCrafter got released.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/76/footer.jpg)
“Silencio” by me
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa