AI Art Weekly #102
Happy Halloween, my fellow dreamers, and welcome to issue #102 of AI Art Weekly! 👋🎃
It seems like we’re going through a bit of a paper drought. Found only 31 papers this week, which compared to the usual 200 feels like a desert. But I’m not complaining, means I can take it a bit slower this week 😉
It’s been a while since we had a sponsor, so a big thank you to GRAYDIENT for supporting this issue! Give them a try if you’re looking for an easy way to generate AI art with FLUX and Stable Diffusion 3.5.
Tired of Midjourney’s restrictions? Graydient gives you access to FLUX, SD3.5 with a ton of fine-tunes and unlimited generation and LoRA trainings at a fraction of the cost. Use code aiartweekly
to get 50% off on your first month, or 6 months for free on any annual plan.
Cover Challenge 🎨
For the next cover I’m looking for submissions about oxymorons! Reward is a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Recraft V3 aka Red Panda
Recraft released their new V3 model that claimed the top spot on Hugging Face’s Text-to-Image Benchmark, surpassing current top performers like FLUX and Midjourney. But take that with a grain of salt, as first user reports say that the benchmark results are heavily cherry-picked. Anyhow, according to Recraft, the model supports:
- Improved style variability without the need for LoRA training
- Advanced text generation with ability to handle long texts
- Precise text positioning and size control
- Improved anatomical accuracy and prompt understanding
- Vector image generation capabilities
I’m personally most excited about the SVG generation capabilities. Both the image and SVG model are available on Replicate.
Oasis: A Universe in a Transformer
Decart and Etched released Oasis this week, an AI model that generates an interactive Minecraft video game world in real-time. I tried it this morning and it’s kinda a surreal experience and feels more like a dream. The entire game is completely simulated, there is no code running in the background, simply input coming in, the model generating tokens, and frames coming out. Oasis supports:
- Real-time gameplay generation at 20 frames per second
- Interactive physics, game rules, and graphics generation
- Support for complex game mechanics (building, lighting, inventory management)
- Dynamic environments with diverse settings and locations
- Temporal stability through innovative dynamic noising technique
Unlike other text-to-video models that take 10-20 seconds to generate one second of video, Oasis produces frames every 0.04 seconds, making it 100x faster than current alternatives.
The demo is available here. Code can be found on GitHub and the weights are available on Hugging Face.
Stable Diffusion 3.5 Medium
After last weeks release of Stable Diffusion 3.5 Large, the Medium model with 2.5B parameters was released this week as well. Stability claims that this model runs “out of the box” on consumer hardware, even on a toaster. Well, I haven’t found a toaster with 9.9GB of VRAM yet, but I’m a GPU poor pleb anyway… 😭
Weights can be found on HuggingFace and inference code is available on GitHub.
3D
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
MoGe can turn images and videos into 3D point maps.
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting
PF3plat can generate photorealistic images and accurate camera positions from uncalibrated image collections.
Image
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
FreCaS can generate high-resolution images quickly using a method that breaks the process into stages with increasing detail. It is about 2.86× to 6.07× faster than other tools for creating 2048×2048 images and improves image quality significantly.
Factor Graph Diffusion: Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
Factor Graph Diffusion can generate high-quality images with better prompt adherence. The method allows for controllable image creation using tools like segmentation and depth maps.
Audio
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
OmniSep can isolate clean soundtracks from mixed audio using text, images, and audio queries.
Also interesting
--sref 845332339
aka “Color Drift” is a vibrant and playful illustration style characterized by colorful geometric shapes and whimsical designs, perfect for creating eye-catching and flat images in Midjourney.
@BLVCKLIGHTai released his first feature film with a length of 1:42h. I’m not the biggest fan of long videos, but I skipped through some of the scenes and there is some solid work in there. Check it out!
@elevenlabsio shipped a new Voice Design feature and built an open-source app that generates a unique voice from your X/Twitter profile to showcase it. Pretty cool. Here is my made up voice.
@AllarHaltsonen shared his new Midjourney Retexture Workflow. He first creates the image in Flux, then retextures details with Midjourney and lastly upscales using Leonardo.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying a print of my art from my art shop. You can request any of my artworks to be printed, just reply to this email.
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa