AI Art Weekly #74

Hello there, my fellow dreamers, and welcome to issue #74 of AI Art Weekly! 👋

Lots of cool stuff got published this week, so let’s dive right in!

  • Stable Diffusion 3 Research Paper
  • TripoSR fast image-to-3D
  • MagicClay can do mesh editing
  • PixArt-Σ supports native 4K text-to-image generation
  • ResAdapter enables better multi-resolution support
  • PeRFlow speeds up diffusion models
  • RealCustom for real-time text-to-image customization
  • ViewDiff generates multi-view consistent images
  • UniCtrl improves text-to-video models
  • Pix2Gif generates GIFs from a single image
  • Interview with Chissweetart
  • and more!

Cover Challenge 🎨

Theme: invisible
93 submissions by 57 artists
AI Art Weekly Cover Art Challenge invisible submission by WhiteSolitude22
🏆 1st: @WhiteSolitude22
AI Art Weekly Cover Art Challenge invisible submission by ChanduStun
🥈 2nd: @ChanduStun
AI Art Weekly Cover Art Challenge invisible submission by Boxio3
🥉 3rd: @Boxio3
AI Art Weekly Cover Art Challenge invisible submission by Ethereal_Gwirl
🧡 4th: @Ethereal_Gwirl

News & Papers

Stable Diffusion 3 Research Paper

Stability released the Stable Diffusion 3 research paper this week with some additional image output examples. The prompt coherence is pretty cool.

Prompt: Beautiful pixel art of a Wizard with hovering text 'Achievement unlocked: Diffusion models can spell now

TripoSR: Fast 3D Object Reconstruction from a Single Image

Stability (together with Tripo AI) also released TripoSR this week, a 3D reconstruction model that can generate a 3D mesh from a single image in under 0.5 seconds.

TripoSR examples

MagicClay: Sculpting Meshes With Generative Neural Fields

While TripoSR can generate meshes from an image, MagicClay can edit them. It’s an artist-friendly tool that allows you to sculpt regions of a mesh with text prompts while keeping other regions untouched.

MagicClay example with prompt: an astronaut riding a horse

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.

PixArt-Σ example

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Remember the old days where it was a PITA to generate images with a resolution other than 512x512? ResAdapter fixes that. It’s a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. This enables the efficient inference of multi-resolution images without the need for repeat denoising steps and complex post-processing operations.

ResAdapter example

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

ByteDance published a new low-step method called PeRFlow which accelerates diffusion models like Stable Diffusion to generate images faster. PeRFlow is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines such as ControlNet, Wonder3D and more.

Fast text-to-image generation with PeRFlow

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

RealCustom is yet another image personalization method. This one is able to generate realistic images that consistently adhere to the given text and any subject from a single image in real-time.

RealCustom examples

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

ViewDiff is a method that can generate high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from a single text prompt or a single posed image.

ViewDiff donut examples

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

In the video department we had UniCtrl this week. A method that can be used to improve the semantic consistency and motion quality of videos generated by text-to-video models without additional training. The method is universally applicable and can be used to enhance various text-to-video models.

Left without UniCtrl. Right with UniCtrl.

Pix2Gif: Motion-Guided Diffusion for GIF Generation

And last but not least, Microsoft published Pix2Gif this week. A image-to-video model that is able to generate GIFs from a single image and a text prompt. They claim that the model is able to understand motion, but we’re not talking Sora levels here. But it’s certainly a step up motion wise compared to the slow-motion videos we’re used to.

The horse is walking

Also interesting

  • Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
  • DATTT: Depth-aware Test-Time Training for Zero-shot Video Object Segmentatin

Insight 👁️” by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa