AI Art Weekly #93
Hello there, my fellow dreamers, and welcome to issue #93 of AI Art Weekly! 👋
I’ve been busy shipping PROMPTCACHE this week. I’m spending so much time using Generative AI tools every week, so I figured why not build a hub where I can easily share them with everyone. I’ve started adding my favorite Midjourney SREF codes and prompts, but I’m also planning to add more tools and resources in the future. Right now you can get lifetime access for $10, but I’ll raise prices once the library gets bigger. If you have any suggestions or feedback, please let me know! 🙏
In this issue:
- Highlights: FLUX.1 model suite, Midjourney v6.1, Stable Fast 3D, SAM 2
- New 3D research: ExAvatar, Cycle3D, Perm, ObjectCarver, XHand, ClickDiff, NIS-SLAM, Bridging the Gap
- New image research: Matting by Generation, ORG
- New video research: Tora, FreeLong
- and more!
Unlock the full potential of AI-generated art with my curated collection of Midjourney SREF codes and prompts.
Cover Challenge 🎨
For the next cover I’m looking for submissions with hidden meanings! Reward is again fame & glory and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Black Forest Labs launches FLUX.1 model suite
Black Forest Labs, a new company in the generative AI sector funded by Andreessen Horowitz, announced the release of their FLUX.1 text-to-image model suite. Key points include:
- FLUX.1 uses a hybrid architecture of multimodal and parallel diffusion transformer blocks
- Models are scaled to 12B parameters
- Incorporates flow matching, rotary positional embeddings, and parallel attention layers
- Three variants released: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell]
- Supports diverse aspect ratios and resolutions between 0.1 and 2.0 megapixels
BTL claims FLUX.1 sets new benchmarks in image synthesis, particularly in visual quality, prompt adherence, and output diversity. The FLUX.1 [dev] and [schnell] variants are open-weight models available for non-commercial and personal use respectively, while FLUX.1 [pro] is accessible via API for commercial applications on Replicate, fal.ai or their own API.
Midjourney v6.1 released
Midjourney has released version 6.1 of their image generation model. Improvements include:
- Enhanced image coherence, particularly for anatomical features and organic subjects
- Improved image quality with reduced artifacts and enhanced textures
- Increased precision for small image details
- New upscalers with improved texture quality (they’re really good 👌)
- Approximately 25% faster processing for standard image jobs
- Enhanced text rendering accuracy when using “quotations” in prompts
- Updated personalization model with improved nuance and accuracy
- Introduction of personalization code versioning
- New
--q 2
mode offering increased texture at the cost of coherence
Stability AI releases Stable Fast 3D
Stability AI has introduced Stable Fast 3D, a new model for 3D asset generation. Features include:
- Generates 3D assets from a single input image in 0.5 seconds
- Produces UV unwrapped mesh, material parameters, and albedo colors
- Optional quad or triangle remeshing (adds 100-200ms to processing time)
- Runs on GPUs with 7GB VRAM or via Stability AI API
The model outperforms previous versions, reducing inference time from 10 minutes (SV3D) to 0.5 seconds while maintaining output quality. It’s designed for rapid prototyping in gaming, virtual reality, retail, architecture, and design. Weights and code can be found on Hugging Face and GitHub. There is also a HuggingFace demo.
Meta Introduces SAM 2: Advanced Video and Image Segmentation Model
Meta released SAM 2 (Segment Anything Model 2), a new model for segmenting objects in both videos and images. Capabilities include:
- Unified segmentation for videos and images
- Interactive object selection and tracking across video frames
- Real-time processing and streaming inference
- Robust zero-shot performance on unfamiliar content
SAM 2 outperforms existing models in object segmentation tasks, especially for tracking object parts. It requires less interaction time compared to other interactive video segmentation methods. The model’s architecture includes a per-session memory module, allowing it to track objects even when they temporarily disappear from view.
The weights and code have been open-sourced and they also released an interactive web demo to try out the model.
New 3D research
Expressive Whole-Body 3D Gaussian Avatar
ExAvatar can animate expressive whole-body 3D human avatars from a short monocular video. It captures facial expressions, hand motions, and body poses in the process.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Cycle3D can generate high-quality and consistent 3D content from a single unposed image. This approach enhances texture consistency and multi-view coherence, significantly improving the quality of the final 3D reconstruction.
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Perm can generate and manipulate 3D hairstyles. It enables applications such as 3D hair parameterization, hairstyle interpolation, single-view hair reconstruction, and hair-conditioned image generation.
ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects
ObjectCarver can segment, reconstruct, and separate 3D objects from a single view using just user-input clicks, eliminating the need for segmentation masks.
XHand: Real-time Expressive Hand Avatar
XHand can generate high-fidelity hand shapes and textures in real-time, enabling expressive hand avatars for virtual environments.
ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models
ClickDiff can generate controllable grasps for 3D objects. It employs a Dual Generation Framework to produce realistic grasps based on user-specified or algorithmically predicted contact maps.
NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding
NIS-SLAM can reconstruct high-fidelity surfaces and geometry from RGB-D frames. It also learns 3D consistent semantic representations during this process.
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
Bridging the Gap can generate studio-like illuminated texture maps from short monocular phone captures. The method enables the creation of photorealistic, uniformly lit avatars while enhancing facial details and addressing common issues found in traditional phone scans, such as missing regions and baked-in lighting.
New image research
Matting by Generation
Matting by Generation can produce high-resolution and photorealistic alpha mattes using diffusion models. This method effectively redefines image matting as a generative modeling challenge, demonstrating superior performance across multiple benchmark datasets.
Floating No More: Object-Ground Reconstruction from a Single Image
ORG can reconstruct 3D object geometry from a single image while accurately modeling the relationship between the object, ground, and camera. This method significantly improves shadow rendering and object pose manipulation, addressing common issues like floating or tilted objects in 3D-aware image editing applications.
New video research
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Tora can generate high-quality videos with precise control over motion trajectories by integrating textual, visual, and trajectory conditions. It achieves high motion fidelity and allows for diverse video durations, aspect ratios, and resolutions, making it a versatile tool for video generation.
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
FreeLong can generate 128 frame videos from short video diffusion models trained on 16 frame videos without requiring additional training. It’s not SOTA, but has just the right amount of cursedness 👌
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa