AI Art Weekly #72

Hello there, my fellow dreamers, and welcome to issue #72 of AI Art Weekly! ๐Ÿ‘‹

Been out cold from the flu this week, so Iโ€™m keeping it short Today. Letโ€™s jump right into this weeks highlights:

  • Stable Diffusion 3 goes into early preview
  • SDXL got a Lightning upgrade
  • FiT is a new transformer architecture for unrestricted image aspect ratios
  • Snap Video is a new video model by Snapchat
  • Binary Opacity Grids renders high-quality meshes in real-time
  • Argus3D generates 3D meshes from images and text prompts
  • FlashTex is a new method for fast mesh texturing
  • Visual Style Prompting is a new SOTA for style transfer
  • SCG can help you compose and improvise new piano pieces
  • and more!

Cover Challenge ๐ŸŽจ

Theme: alternate history
45 submissions by 32 artists
AI Art Weekly Cover Art Challenge alternate history submission by m_i_a_s_box
๐Ÿ† 1st: @m_i_a_s_box
AI Art Weekly Cover Art Challenge alternate history submission by DocT___
๐Ÿฅˆ 2nd: @DocT___
AI Art Weekly Cover Art Challenge alternate history submission by Historic_Crypto
๐Ÿฅ‰ 3rd: @Historic_Crypto
AI Art Weekly Cover Art Challenge alternate history submission by PapaBeardedNFTs
๐Ÿงก 4th: @PapaBeardedNFTs

News & Papers

Stable Diffusion 3

Stable Diffusion 3 went into the early preview stage this week. The model is not yet available, but the waitlist for an early preview is open. Stable Diffusion 3 is said to have greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

ByteDance (the TikTok company) found a way to generate high-quality 1024px images in only a few steps which they call SDXL-Lightning. There is also a demo on HuggingFace and fastsdxl.ai.

Images generated with SDXL-Lightning

FiT: Flexible Vision Transformer for Diffusion Model

State of the art diffusion models are trained on square images. FiT is a new transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios (similar to what Sora does). This enables a flexible training strategy that effortlessly adapts to diverse aspect ratios during both training and inference phases, thus promoting resolution generalization and eliminating biases induced by image cropping.

Image generated with FiT

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Itโ€™s hard to follow-up with video models after Sora. But Snap Video is an interesting one! The model by the Snapchat company is a model that addresses redundancy in pixel image generation which leads to videos with substantially higher quality, temporal consistency, and motion complexity compared to other methods. Like FiT, it also utilizes a new transformer architecture that trains 3.31x and inferences ~4.5x faster compared to U-Nets.

Three hamster runs on a wheel, exercising in its cage.

Binary Opacity Grids

Binary Opacity Grids is a new method for mesh-based view synthesis that is able to capture fine geometric detail. The resulting meshes can be rendered in real-time on mobile devices and achieve significantly higher quality compared to existing approaches.

Binary Opacity Grids example

Argus3D: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

Argus3D is another model that is able to generate 3D meshes from images and text prompts as well as unique textures for its generated shapes. Just imagine composing a 3D scene and fill it with objects by pointing at a space and using natural language to describe what you want to place there.

Generated meshes of tables

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Roblox published FlashTex this week. The method can texture an input 3D mesh given a user-provided text prompt. These generated textures can also be relit properly in different lighting environments.

FlashTex examples

Visual Style Prompting with Swapping Self-Attention

Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.

Visual Style Prompting comparison with other methods

SCG: Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

SCG can be used by musicians to compose and improvise new piano pieces. It allows musicians to guide music generation by using rules like following a simple I-V chord progression in C major. Pretty cool.

SCG generated loop playing on a Disklavier piano

Also interesting

  • UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
  • MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single to Sparse-view 3D Object Reconstruction
  • Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
  • GaussianPro: 3D Gaussian Splatting with Progressive Propagation

โ€œSeeking Quietudeโ€ by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it ๐Ÿ™โค๏ธ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐Ÿ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

โ€“ dreamingtulpa

by @dreamingtulpa