AI Art Weekly #105

Hello there, my fellow dreamers, and welcome to issue #105 of AI Art Weekly! 👋

First snow has fallen in Switzerland and I’m cozying up with a cup of hot tea to bring you the latest AI art news and papers - which there are plenty of this week! 🌨️

One of the biggest is the release of FLUX.1 Tools, which brings inpainting, outpainting, retexturing and image variation to the FLUX.1 ecosystem.

From the paper side, research is still going strong with 155 papers I’ve skimmed through this week. Some with code, some without, remember to bookmark the ones you’re interested in, with Premium I’ll notify you once their code is released.


Cover Challenge 🎨

Theme: threshold
30 submissions by 20 artists
AI Art Weekly Cover Art Challenge threshold submission by LuoErik8lrl
🏆 1st: @LuoErik8lrl
AI Art Weekly Cover Art Challenge threshold submission by xdcp07
🥈 2nd: @xdcp07
AI Art Weekly Cover Art Challenge threshold submission by fhrlich_sergej
🥈 2nd: @fhrlich_sergej
AI Art Weekly Cover Art Challenge threshold submission by VirginiaLori
🥉 3rd: @VirginiaLori

News & Papers

Highlights

FLUX.1 Tools

Black Forest Labs released FLUX.1 Tools, a comprehensive suite of AI models designed to enhance their base text-to-image model FLUX.1. This release introduces four major features available in both open-access [dev] and professional [pro] versions through the BFL API:

  • FLUX.1 Fill: Advanced inpainting and outpainting capabilities, outperforming competitors like Ideogram 2.0
  • FLUX.1 Depth: Structural guidance using depth maps for precise image transformations
  • FLUX.1 Canny: Edge-based structural guidance for maintaining image composition
  • FLUX.1 Redux: Image variation and restyling adapter supporting high-quality 4-megapixel outputs

The tools are available through multiple platforms including fal.ai, Replicate, Together.ai, Freepik, and krea.ai. Benchmark tests show FLUX.1 Fill [pro] achieving state-of-the-art performance in inpainting, while FLUX.1 Depth outperforms Midjourney ReTexture in structural conditioning.

“FLUX.1 Fill inpainting examples”

LTX-Video

Lightricks released a new video generation model called LTX-Video. It is capable of generating 5 second 720p videos with 24 frames in just 4 seconds on an Nvidia H100!

Yeah, that’s faster than real time 🤯

The inference code is available on Github, the model on HuggingFace and a Demo on fal.ai.

A close-up of a man's face in a dimly lit setting, with a blurred figure in the foreground

Suno V4

Suno released v4 of their music model, bringing major improvements to audio quality and song creation including:

  • a Remaster tool for upgrading old tracks
  • A new AI lyrics assistant called ReMi
  • Enhanced Cover Art generation
  • Improved song covers and style consistency
  • Cleaner audio and sharper lyrics

I made a prediction a few months ago that 90% of music we listen to will be AI-generated in the next 5 years, and listening to these new Suno v4 tracks, I’m more convinced than ever.

DNA - ali” cover created with Suno V4

The Matrix: AI-Powered Infinite World Generator

Alibaba Group’s teased a new AI system “The Matrix” which can generate endless interactive virtual worlds in real-time, setting new standards for AI world simulation. The system supports:

  • Real-time generation at 16 FPS
  • AAA-game quality visuals (720p)
  • Frame-level control precision
  • Infinite video generation
  • Virtual-to-real world adaptation
  • First and third-person views

The Matrix surpasses other solutions like GameNGen (issue 96) and Oasis (issue 102) in video length and control precision. Unfortunately we’re talking about Alibaba, so it’s unlikely to be available for public use anytime soon.

The Matrix example

3D

Find Any Part in 3D

Find3D can segment parts of 3D objects based on text queries.

Find Any Part in 3D example

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

GPS-Gaussian+ can render high-resolution 3D scenes from 2 or more input images in real-time.

GPS-Gaussian+ example

Portrait Diffusion: Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion

Portrait Diffusion can generate detailed 3D portraits from a single image. It improves texture and shape accuracy by using a Multi-View Noise Resampling Strategy, which helps reduce blurred textures.

Portrait Diffusion examples

Image

Stylecodes: Encoding Stylistic Information For Image Generation

StyleCodes can encode the style of an image into a 20-symbol base64 code for easy sharing and use in image generation. It allows users to create style-reference codes (srefs) from their own images, helping to control styles in diffusion models with high quality.

Stylecodes examples

From Text to Pose to Image: Improving Diffusion Model Control and Quality

From Text to Pose to Image can generate high-quality images from text prompts by first creating poses and then using them to guide image generation. This method improves control over human poses and enhances image fidelity in diffusion models.

From Text to Pose to Image example

Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

Oscillation Inversion can restore and upscale images as well as videos. The method even allows for low-level editing tasks like adjusting lighting and changing colors.

Oscillation Inversion example

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

FlipSketch can generate sketch animations from static drawings by allowing users to describe the desired motion. It uses motion priors from text-to-video diffusion models to create smooth animations while keeping the original sketch’s look.

FlipSketch example

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

FitDiT can generate realistic virtual try-on images that show how clothes fit on different body types. It keeps garment textures clear and works quickly, taking only 4.57 seconds for a single image.

FitDiT examples

Video

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI combines the SOTA visual video tracking of SAM 2 with motion-aware memory.

SAMURAI example

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

StableV2V can stabilize shape consistency in video-to-video editing by breaking down the editing process into steps that match user prompts. It handles text-based, image-based, and video inpainting.

StableV2V example

JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

JoyVASA can generate high-quality lip-sync videos of human and animal faces from a single image and speech clip.

JoyVASA example

AnimateAnything: Consistent and Controllable Animation for Video Generation

AnimateAnything can generate smooth and controllable animations from images. It reduces flickering with a stabilization module and uses multi-scale contro to create clear frame-by-frame motion.

AnimateAnything example

Also interesting

“The Blood Moon Council commences” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa