AI Art Weekly #76

Hello there, my fellow dreamers, and welcome to issue #76 of AI Art Weekly! 👋

Just this morning I skimmed through a whopping 230+ papers and projects, and let me tell you, we aren’t prepared for the exponentials that are about to hit us. Robots, AI superchips, LLM powered operating systems, an explosion in 3D content, things are ramping up hard.

This issue is the biggest one yet. It’s about 4 times as big as usual. So if you can, please consider supporting this newsletter by buying me a coffee or becoming a monthly supporter.

Due to the sheer amount of content this week, I’ve split the news section into categories. In this issue we cover:

  • 3D generation and texturing: 13 different methods for text and image-to-3D, InTeX, TexDreamer, GaussianFlow
  • Image generation: SD3 Turbo, LightIt, OMG, YOSO, FouriScale, Desigen
  • Image editing: StyleSketch, Wear-Any-Way, DiffCriticEdit, Magic Fixup, DesignEdit, ReNoise
  • Video generation: AnimateDiff-Lightning, StyleCineGAN, Time Reversal, Mora, VSTAR
  • Video editing: FRESCO, AnyV2V, MOTIA
  • and more!

Cover Challenge 🎨

Theme: polygons
58 submissions by 32 artists
AI Art Weekly Cover Art Challenge polygons submission by onchainsherpa
🏆 1st: @onchainsherpa
AI Art Weekly Cover Art Challenge polygons submission by Berezin12345
🥈 2nd: @Berezin12345
AI Art Weekly Cover Art Challenge polygons submission by joan38104108
🥉 3rd: @joan38104108
AI Art Weekly Cover Art Challenge polygons submission by CurlyP139
🥉 3rd: @CurlyP139

News & Papers

3D generation and texturing

Text-to-3D and Image-to-3D

As I said in the intro, 3D content is about to explode. Just this week we had 13 papers on text and image-to-3D object reconstruction alone. As they’re all somewhat similar, I’m not going to dissect them all. Instead, I’ll just list them here:

  • SV3D: Stability AI released a new model for high-resolution, image-to-3D reconstruction.
  • LATTE3D: NVIDIA’s new text-to-3D method to generates high-quality textured meshes from text robustly in just 400ms.
  • Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding.
  • MVControl: Text-to-3D with ControlNet like conditioning (canny, depth, scribble, etc.).
  • Make-Your-3D: Image-to-3D with the ability to control generation with a text prompt
  • MVEdit: Supports text-to-3D, image-to-3D, and 3D-to-3D with texture generation.
  • VFusion3D: Image-to-3D from Video Diffusion Models.
  • GVGEN: Text-to-3D Generation with Volumetric Representation.
  • GRM: High-quality, efficient text-to-3D and image-to-3D in 100ms
  • FDGaussian: Image-to-3D with Gaussian Splatting.
  • Ultraman: Image-to-3D with a focus on human avatars.
  • Sculpt3D: More text-to-3D.
  • ComboVerse: More image-to-3D.

SV3D takes an image as input and generates novel multi-view images and 3D models.

InTeX: Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting

Now that we have a gazillion options to generate 3D objects, we might want to have more control over the textures. InTeX helps with that by generating and inpainting textures from text.

InTeX inpainting example

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

And another one! TexDreamer is a high-fidelity 3D human texture generation model that supports both text and image inputs.

Animated TexDreamer examples

GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Image-to-3D is cool. But what about video-to-4D? GaussianFlow can generate 4D Gaussian Splatting fields from monocular videos (like Sora).

GaussianFlow examples

Image generation

Stable Diffusion 3 Turbo

Stable Diffusion 3 hasn’t even been released yet, and Stability already announced its Turbo version. This is SD3 but faster, think SDXL quality in 4 steps.

SD3 Turbo example: A close-up of a woman’s face, lit by the soft glow of a neon sign in a dimly lit, retro diner, hinting at a narrative of longing and nostalgia.

LightIt: Illumination Modeling and Control for Diffusion Models

Now, let’s talk image generation. LightIt is a method for explicit illumination control for image generation. It’s the first method that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

LightIt illumination control

OMG: Occlusion-friendly Personalized Multi-concept Generation In Diffusion Models

OMG is a framework for multi-concept image generation, supporting character and style LoRAs. Instead of LoRAs, it also supports InstantID for multi-ID support.

OMG multi-concept generation

YOSO: You Only Sample Once

Image models are becoming faster, bigger, better. YOSO is a new method that can finetune pretrained diffusion models to generate high-fidelity images in one-step.

Images generated with the YOSO-PixArt-α-1024 model

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.

Different FouriScale images of varying aspect ratios

Desigen: A Pipeline for Controllable Design Template Generation

Unlimited design templates unlocked. Desigen is a pipeline for automatic template creation which generates background images as well as harmonious layout elements over the background. This could be used to generate design templates for websites, presentations, social media posts and more.

Desigen examples

Image editing

StyleSketch: Stylized Face Sketch Extraction via Generative Prior with Limited Data

StyleSketch is a method for extracting high-resolution stylized sketches from a face image. Pretty cool!

StyleSketch examples

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

Wear-Any-Way is a new framework for virtual try-on that supports users to precisely manipulate the wearing style of garments. The method enables users to drag sleeves to roll them up, open coats, and control the style of tucks, among other things.

Wear-Any-Way multi-garment try-on

Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors

DiffCriticEdit enables 3D manipulations on images, such as object rotation and translation.

Rotating a chair with DiffCriticEdit

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Adobe’s Magic Fixup lets you edit images with a cut-and-paste approach that fixes edits automatically. Can see this being super useful for generating animation frames for tools like AnimateDiff. But it’s not clear yet if or when this hits Photoshop.

Magic Fixup examples

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

DesignEdit is another image editing method, but from Microsoft. It can remove objects, edit typography, swap, relocate, resize, add and flip multiple objects, pan and zoom images, remove decorations from images, and edit posters.

DesignEdit examples

ReNoise: Real Image Inversion Through Iterative Noising

ReNoise can be used to reconstruct an input image that can be edited using text prompts.

ReNoise turning a monkey into a poodle

Video generation


After SDXL Lightning, ByteDance now released AnimateDiff-Lightning. A text-to-video model that can generate videos more than ten times faster than the original AnimateDiff.

AnimateDiff-Lightning examples

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

StyleCineGAN is a method that can generate high-resolution looping cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN.

StyleCineGAN examples

Time Reversal: Explorative Inbetweening of Time and Space

Time Reversal is making it possible to generate in-between frames of two input images. In particular, this enables the generation of looping cinemagraphs as well as camera and subject motion videos.

Cinemagraph example

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Mora is an open-source attempt at replicating OpenAI’s Sora video model capabilities in various tasks such as text-to-video generation, image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds. Results are far away from Sora, but it’s a start!

12 second long Mora text-to-video example at 1024×576 resolution

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

VSTAR is a method that enables text-to-video models to generate longer videos with dynamic visual evolution in a single pass, without finetuning needed.

VSTAR examples

Video editing

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

FRESCO combines ControlNet with Ebsynth for zero-shot video translation that focuses on preserving the spatial and temporal consistency of the input frames.

FRESCO examples

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

AnyV2V can edit a source video along with additional control (such as text prompts, subjects, or styles). Looks like one of the best Gen-1 alternatives yet.

make it snowing AnyV2V example

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

MOTIA is a high-quality flexible video outpainting method. But no code yet 😭

MOTIA outpainting example

Also interesting

  • SceneScript: an AI model and method to understand and describe 3D spaces
  • Arc2Face: A Foundation Model of Human Faces
  • ScoreHMR: Score-Guided Diffusion for 3D Human Recovery
  • Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
  • Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
  • SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
  • GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Silencio” by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa