AI Art Weekly #102

Happy Halloween, my fellow dreamers, and welcome to issue #102 of AI Art Weekly! 👋🎃

It seems like we’re going through a bit of a paper drought. Found only 31 papers this week, which compared to the usual 200 feels like a desert. But I’m not complaining, means I can take it a bit slower this week 😉

It’s been a while since we had a sponsor, so a big thank you to GRAYDIENT for supporting this issue! Give them a try if you’re looking for an easy way to generate AI art with FLUX and Stable Diffusion 3.5.


Cover Challenge 🎨

Theme: halloween
91 submissions by 62 artists
AI Art Weekly Cover Art Challenge halloween submission by AvisMelodieux
🏆 1st: @AvisMelodieux
AI Art Weekly Cover Art Challenge halloween submission by sudv05
🥈 2nd: @sudv05
AI Art Weekly Cover Art Challenge halloween submission by VirginiaLori
🥉 3rd: @VirginiaLori
AI Art Weekly Cover Art Challenge halloween submission by roll4d4
🧡 4th: @roll4d4

News & Papers

Highlights

Recraft V3 aka Red Panda

Recraft released their new V3 model that claimed the top spot on Hugging Face’s Text-to-Image Benchmark, surpassing current top performers like FLUX and Midjourney. But take that with a grain of salt, as first user reports say that the benchmark results are heavily cherry-picked. Anyhow, according to Recraft, the model supports:

  • Improved style variability without the need for LoRA training
  • Advanced text generation with ability to handle long texts
  • Precise text positioning and size control
  • Improved anatomical accuracy and prompt understanding
  • Vector image generation capabilities

I’m personally most excited about the SVG generation capabilities. Both the image and SVG model are available on Replicate.

Recraft V3 is able to generate images with long texts. Something we haven’t seen from other models yet.

Oasis: A Universe in a Transformer

Decart and Etched released Oasis this week, an AI model that generates an interactive Minecraft video game world in real-time. I tried it this morning and it’s kinda a surreal experience and feels more like a dream. The entire game is completely simulated, there is no code running in the background, simply input coming in, the model generating tokens, and frames coming out. Oasis supports:

  • Real-time gameplay generation at 20 frames per second
  • Interactive physics, game rules, and graphics generation
  • Support for complex game mechanics (building, lighting, inventory management)
  • Dynamic environments with diverse settings and locations
  • Temporal stability through innovative dynamic noising technique

Unlike other text-to-video models that take 10-20 seconds to generate one second of video, Oasis produces frames every 0.04 seconds, making it 100x faster than current alternatives.

The demo is available here. Code can be found on GitHub and the weights are available on Hugging Face.

Oasis gameplay examples

Stable Diffusion 3.5 Medium

After last weeks release of Stable Diffusion 3.5 Large, the Medium model with 2.5B parameters was released this week as well. Stability claims that this model runs “out of the box” on consumer hardware, even on a toaster. Well, I haven’t found a toaster with 9.9GB of VRAM yet, but I’m a GPU poor pleb anyway… 😭

Weights can be found on HuggingFace and inference code is available on GitHub.

Stable Diffusion 3.5 Medium example

3D

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

MoGe can turn images and videos into 3D point maps.

MoGe example

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

PF3plat can generate photorealistic images and accurate camera positions from uncalibrated image collections.

PF3plat example

Image

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

FreCaS can generate high-resolution images quickly using a method that breaks the process into stages with increasing detail. It is about 2.86× to 6.07× faster than other tools for creating 2048×2048 images and improves image quality significantly.

FreCaS example

Factor Graph Diffusion: Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis

Factor Graph Diffusion can generate high-quality images with better prompt adherence. The method allows for controllable image creation using tools like segmentation and depth maps.

Factor Graph Diffusion examples

Audio

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

OmniSep can isolate clean soundtracks from mixed audio using text, images, and audio queries.

Checkout the project page for OmniSep audio examples

Also interesting

“eXecrate” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying a print of my art from my art shop. You can request any of my artworks to be printed, just reply to this email.

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa