AI Art Weekly #93

Hello there, my fellow dreamers, and welcome to issue #93 of AI Art Weekly! 👋

I’ve been busy shipping PROMPTCACHE this week. I’m spending so much time using Generative AI tools every week, so I figured why not build a hub where I can easily share them with everyone. I’ve started adding my favorite Midjourney SREF codes and prompts, but I’m also planning to add more tools and resources in the future. Right now you can get lifetime access for $10, but I’ll raise prices once the library gets bigger. If you have any suggestions or feedback, please let me know! 🙏

In this issue:

  • Highlights: FLUX.1 model suite, Midjourney v6.1, Stable Fast 3D, SAM 2
  • New 3D research: ExAvatar, Cycle3D, Perm, ObjectCarver, XHand, ClickDiff, NIS-SLAM, Bridging the Gap
  • New image research: Matting by Generation, ORG
  • New video research: Tora, FreeLong
  • and more!

Cover Challenge 🎨

Theme: distortions
85 submissions by 52 artists
AI Art Weekly Cover Art Challenge distortions submission by amorvobiscum
🏆 1st: @amorvobiscum
AI Art Weekly Cover Art Challenge distortions submission by ThunderMonique
🥈 2nd: @ThunderMonique
AI Art Weekly Cover Art Challenge distortions submission by NomadsVagabonds
🥉 3rd: @NomadsVagabonds
AI Art Weekly Cover Art Challenge distortions submission by absurd_o
🧡 4th: @absurd_o

News & Papers

Highlights

Black Forest Labs launches FLUX.1 model suite

Black Forest Labs, a new company in the generative AI sector funded by Andreessen Horowitz, announced the release of their FLUX.1 text-to-image model suite. Key points include:

  • FLUX.1 uses a hybrid architecture of multimodal and parallel diffusion transformer blocks
  • Models are scaled to 12B parameters
  • Incorporates flow matching, rotary positional embeddings, and parallel attention layers
  • Three variants released: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell]
  • Supports diverse aspect ratios and resolutions between 0.1 and 2.0 megapixels

BTL claims FLUX.1 sets new benchmarks in image synthesis, particularly in visual quality, prompt adherence, and output diversity. The FLUX.1 [dev] and [schnell] variants are open-weight models available for non-commercial and personal use respectively, while FLUX.1 [pro] is accessible via API for commercial applications on Replicate, fal.ai or their own API.

detailed cinematic dof render of an old dusty detailed CRT monitor on a wooden desk in a dim room with items around, messy dirty room. On the screen are the letters “AI ART WEEKLY” glowing softly. High detail hard surface render

Midjourney v6.1 released

Midjourney has released version 6.1 of their image generation model. Improvements include:

  • Enhanced image coherence, particularly for anatomical features and organic subjects
  • Improved image quality with reduced artifacts and enhanced textures
  • Increased precision for small image details
  • New upscalers with improved texture quality (they’re really good 👌)
  • Approximately 25% faster processing for standard image jobs
  • Enhanced text rendering accuracy when using “quotations” in prompts
  • Updated personalization model with improved nuance and accuracy
  • Introduction of personalization code versioning
  • New --q 2 mode offering increased texture at the cost of coherence

asian model, white tunic, santorini, harsh sunlight, ultra wide angle, photorealistic, dark art style, cinematic, hyper realistic

Stability AI releases Stable Fast 3D

Stability AI has introduced Stable Fast 3D, a new model for 3D asset generation. Features include:

  • Generates 3D assets from a single input image in 0.5 seconds
  • Produces UV unwrapped mesh, material parameters, and albedo colors
  • Optional quad or triangle remeshing (adds 100-200ms to processing time)
  • Runs on GPUs with 7GB VRAM or via Stability AI API

The model outperforms previous versions, reducing inference time from 10 minutes (SV3D) to 0.5 seconds while maintaining output quality. It’s designed for rapid prototyping in gaming, virtual reality, retail, architecture, and design. Weights and code can be found on Hugging Face and GitHub. There is also a HuggingFace demo.

Stable Fast 3D animation

Meta Introduces SAM 2: Advanced Video and Image Segmentation Model

Meta released SAM 2 (Segment Anything Model 2), a new model for segmenting objects in both videos and images. Capabilities include:

  • Unified segmentation for videos and images
  • Interactive object selection and tracking across video frames
  • Real-time processing and streaming inference
  • Robust zero-shot performance on unfamiliar content

SAM 2 outperforms existing models in object segmentation tasks, especially for tracking object parts. It requires less interaction time compared to other interactive video segmentation methods. The model’s architecture includes a per-session memory module, allowing it to track objects even when they temporarily disappear from view.

The weights and code have been open-sourced and they also released an interactive web demo to try out the model.

New 3D research

Expressive Whole-Body 3D Gaussian Avatar

ExAvatar can animate expressive whole-body 3D human avatars from a short monocular video. It captures facial expressions, hand motions, and body poses in the process.

ExAvatar example

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Cycle3D can generate high-quality and consistent 3D content from a single unposed image. This approach enhances texture consistency and multi-view coherence, significantly improving the quality of the final 3D reconstruction.

Cycle3D example

Perm: A Parametric Representation for Multi-Style 3D Hair Modeling

Perm can generate and manipulate 3D hairstyles. It enables applications such as 3D hair parameterization, hairstyle interpolation, single-view hair reconstruction, and hair-conditioned image generation.

Perm examples

ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

ObjectCarver can segment, reconstruct, and separate 3D objects from a single view using just user-input clicks, eliminating the need for segmentation masks.

ObjectCarver example

XHand: Real-time Expressive Hand Avatar

XHand can generate high-fidelity hand shapes and textures in real-time, enabling expressive hand avatars for virtual environments.

XHand example

ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

ClickDiff can generate controllable grasps for 3D objects. It employs a Dual Generation Framework to produce realistic grasps based on user-specified or algorithmically predicted contact maps.

ClickDiff examples

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

NIS-SLAM can reconstruct high-fidelity surfaces and geometry from RGB-D frames. It also learns 3D consistent semantic representations during this process.

NIS-SLAM example

Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Bridging the Gap can generate studio-like illuminated texture maps from short monocular phone captures. The method enables the creation of photorealistic, uniformly lit avatars while enhancing facial details and addressing common issues found in traditional phone scans, such as missing regions and baked-in lighting.

Bridging the Gap examples

New image research

Matting by Generation

Matting by Generation can produce high-resolution and photorealistic alpha mattes using diffusion models. This method effectively redefines image matting as a generative modeling challenge, demonstrating superior performance across multiple benchmark datasets.

Matting by Generation example

Floating No More: Object-Ground Reconstruction from a Single Image

ORG can reconstruct 3D object geometry from a single image while accurately modeling the relationship between the object, ground, and camera. This method significantly improves shadow rendering and object pose manipulation, addressing common issues like floating or tilted objects in 3D-aware image editing applications.

ORG examples

New video research

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Tora can generate high-quality videos with precise control over motion trajectories by integrating textual, visual, and trajectory conditions. It achieves high motion fidelity and allows for diverse video durations, aspect ratios, and resolutions, making it a versatile tool for video generation.

Tora example

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

FreeLong can generate 128 frame videos from short video diffusion models trained on 16 frame videos without requiring additional training. It’s not SOTA, but has just the right amount of cursedness 👌

FreeLong examples

“You better run!” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa