AI Art Weekly #105
Hello there, my fellow dreamers, and welcome to issue #105 of AI Art Weekly! 👋
First snow has fallen in Switzerland and I’m cozying up with a cup of hot tea to bring you the latest AI art news and papers - which there are plenty of this week! 🌨️
One of the biggest is the release of FLUX.1 Tools, which brings inpainting, outpainting, retexturing and image variation to the FLUX.1 ecosystem.
From the paper side, research is still going strong with 155 papers I’ve skimmed through this week. Some with code, some without, remember to bookmark the ones you’re interested in, with Premium I’ll notify you once their code is released.
Unlock the full potential of AI-generated art with my curated collection of Midjourney SREF codes and prompts.
Cover Challenge 🎨
For the next cover I’m looking for phenomena inspired submissions! Reward is a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
FLUX.1 Tools
Black Forest Labs released FLUX.1 Tools, a comprehensive suite of AI models designed to enhance their base text-to-image model FLUX.1. This release introduces four major features available in both open-access [dev] and professional [pro] versions through the BFL API:
- FLUX.1 Fill: Advanced inpainting and outpainting capabilities, outperforming competitors like Ideogram 2.0
- FLUX.1 Depth: Structural guidance using depth maps for precise image transformations
- FLUX.1 Canny: Edge-based structural guidance for maintaining image composition
- FLUX.1 Redux: Image variation and restyling adapter supporting high-quality 4-megapixel outputs
The tools are available through multiple platforms including fal.ai, Replicate, Together.ai, Freepik, and krea.ai. Benchmark tests show FLUX.1 Fill [pro] achieving state-of-the-art performance in inpainting, while FLUX.1 Depth outperforms Midjourney ReTexture in structural conditioning.
LTX-Video
Lightricks released a new video generation model called LTX-Video. It is capable of generating 5 second 720p videos with 24 frames in just 4 seconds on an Nvidia H100!
Yeah, that’s faster than real time 🤯
The inference code is available on Github, the model on HuggingFace and a Demo on fal.ai.
Suno V4
Suno released v4 of their music model, bringing major improvements to audio quality and song creation including:
- a Remaster tool for upgrading old tracks
- A new AI lyrics assistant called ReMi
- Enhanced Cover Art generation
- Improved song covers and style consistency
- Cleaner audio and sharper lyrics
I made a prediction a few months ago that 90% of music we listen to will be AI-generated in the next 5 years, and listening to these new Suno v4 tracks, I’m more convinced than ever.
The Matrix: AI-Powered Infinite World Generator
Alibaba Group’s teased a new AI system “The Matrix” which can generate endless interactive virtual worlds in real-time, setting new standards for AI world simulation. The system supports:
- Real-time generation at 16 FPS
- AAA-game quality visuals (720p)
- Frame-level control precision
- Infinite video generation
- Virtual-to-real world adaptation
- First and third-person views
The Matrix surpasses other solutions like GameNGen (issue 96) and Oasis (issue 102) in video length and control precision. Unfortunately we’re talking about Alibaba, so it’s unlikely to be available for public use anytime soon.
3D
Find Any Part in 3D
Find3D can segment parts of 3D objects based on text queries.
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views
GPS-Gaussian+ can render high-resolution 3D scenes from 2 or more input images in real-time.
Portrait Diffusion: Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion
Portrait Diffusion can generate detailed 3D portraits from a single image. It improves texture and shape accuracy by using a Multi-View Noise Resampling Strategy, which helps reduce blurred textures.
Image
Stylecodes: Encoding Stylistic Information For Image Generation
StyleCodes can encode the style of an image into a 20-symbol base64 code for easy sharing and use in image generation. It allows users to create style-reference codes (srefs) from their own images, helping to control styles in diffusion models with high quality.
From Text to Pose to Image can generate high-quality images from text prompts by first creating poses and then using them to guide image generation. This method improves control over human poses and enhances image fidelity in diffusion models.
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method
Oscillation Inversion can restore and upscale images as well as videos. The method even allows for low-level editing tasks like adjusting lighting and changing colors.
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
FlipSketch can generate sketch animations from static drawings by allowing users to describe the desired motion. It uses motion priors from text-to-video diffusion models to create smooth animations while keeping the original sketch’s look.
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on
FitDiT can generate realistic virtual try-on images that show how clothes fit on different body types. It keeps garment textures clear and works quickly, taking only 4.57 seconds for a single image.
Video
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
SAMURAI combines the SOTA visual video tracking of SAM 2 with motion-aware memory.
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
StableV2V can stabilize shape consistency in video-to-video editing by breaking down the editing process into steps that match user prompts. It handles text-based, image-based, and video inpainting.
JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation
JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.
AnimateAnything: Consistent and Controllable Animation for Video Generation
AnimateAnything can generate smooth and controllable animations from images. It reduces flickering with a stabilization module and uses multi-scale contro to create clear frame-by-frame motion.
Also interesting
My favorite project of the week is by @juliewdesign_. She designed and printed and children book for her son. 13 high quality illustration condensed from 3698 images all created with one Midjourney SREF code: --sref 1803718622
.
@prkeshari built his dream world radio app with AI. 7,000+ radio stations, ⌘+K
for easy search and actions, save your favorites and recent stations, filter by mood and genre, use a sleep timer, and more. I know what I’m listening to while chilling tonight.
@paultrillo, @makeitrad1 and @hokutokonishi collaborated to create… extremely dope looking clouds. This is just a teaser. Full thing dropping soon.
@thedorbrothers are on a roll lately with their political satire music videos. And they delivered again with this one.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying access to AI Art Weekly Premium 👑
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa