AI Art Weekly #88
Hello there, my fellow dreamers, and welcome to issue #88 of AI Art Weekly! ๐
Another week, another 180+ papers skimmed through and curated for you.
Iโm currently thinking about how I can add more value to these weekly issues and would love to hear your thoughts! So far Iโve been focusing on major product updates and computer vision research that could be used or misused for creative output.
If you have any feedback, simply reply to this email and let me know what you think ๐
In this issue:
- 3D: GaussianDreamerPro, HOIFH, ClotheDreamer, Portrait3D, MIRReS, YouDream, LiveScene, GIC, BRDF-Uncertainty
- Image: AnyControl, ResMaster
- Video: MultiDiff, Text-Animator, MotionBooth, Director3D, MoMo, FreeTraj, MVOC, Conditional Image Leakage, Image Conductor
- Audio: GenAu
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do ๐
Cover Challenge ๐จ
For the next cover Iโm looking for boredom submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality
GaussianDreamerPro can generate 3D Gaussian assets from text that can be seamlessly integrated into downstream manipulation pipelines, such as animation, composition, and simulation.
Human-Object Interaction from Human-Level Instructions
HOIFH generates synchronized object motion, full-body human motion, and detailed finger motion. It is designed for manipulating large objects within contextual environments, guided by human-level instructions.
ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians
ClotheDreamer can generate high-fidelity 3D garments from text prompts. The resulting assets can be used for virtual try-on and support physically accurate animation.
Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image
Portrait3D can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.
MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling
MIRReS can reconstruct and optimize the explicit geometry, material, and lighting of objects from multi-view images. The resulting 3D models can be edited and relit in modern graphics engines or CAD software.
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
YouDream can generate high-quality 3D animals from a single image and a text prompt. The method is able to preserve anatomic consistency and is capable of generating and combining commonly found animals.
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
LiveScene can identify and control multiple objects in complex scenes. It is able to locate individual objects in different states and enables control of them using natural language.
Gaussian-Informed Continuum for Physical Property Identification and Simulation
GIC can recover 3D objects from Gaussian point sets and simulate their physical properties.
Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis
BRDF-Uncertainty can estimate the properties of the materials on an objectโs surface in seconds given its geometry and a lighting environment.
Image
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
AnyControl is a new text-to-image guidance method that can generate images from diverse control signals, such as color, shape, texture, and layout.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
ResMaster is a training-free method that enables diffusion models to generate high-quality 4K images with improved structural coherence and more details.
Video
MultiDiff: Consistent Novel View Synthesis from a Single Image
Given a single RGB image and a camera trajectory of choice, MultiDiff can generate new 3D-consistent views from a single input image.
Text-Animator: Controllable Visual Text Video Generation
Text-Animator can depict the structures of visual text in generated videos. It supports camera control and text refinement to improve the stability of the generated visual text.
MotionBooth: Motion-Aware Customized Text-to-Video Generation
MotionBooth can generate videos of customized subjects from a few images and a text prompt with precise control over both object and camera movements.
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
Director3D can generate real-world 3D scenes and adaptive camera trajectories from text prompts. The method is able to generate pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising.
Disentangled Motion Modeling for Video Frame Interpolation
MoMo is a new video frame interpolation method that is able to generate intermediate frames with high visual quality and reduced computational demands.
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
FreeTraj is a tuning-free approach that enables trajectory control in video diffusion models by modifying noise sampling and attention mechanisms.
MVOC: a training-free multiple video object composition method with diffusion models
MVOC is a training-free multiple video object composition method with diffusion models. The method can be used to composite multiple video objects into a single video while maintaining motion and identity consistency.
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
Conditional Image Leakage can be used to generate videos with more dynamic and natural motion from image prompts.
Image Conductor: Precision Control for Interactive Video Synthesis
Image Conductor can generate video assets from a single image with precise control over camera transitions and object movements.
Audio
Taming Data and Transformers for Audio Generation
GenAu is a new scalable transformer-based audio generation architecture by Snapchat that is able to generate high-quality ambient sounds and effects.
Also interesting
AI Surrealism ๐ถ๐ ๐ฏ๐ฎ๐ฐ๐ธ with 20 AI artists Open Editions on Zora including my piece โAmber Manโ. There is a small chance we could get exhibited on Times Square on July 24. So if you have a moment, one vote would go a long way.
Toys โRโ Us has released the first OpenAI Sora generated brand commercial.
@fabianstelzer built a fully automated Wojak meme generator. Claude 3.5 block generates the meme as JSON. ComfyUI block uses a Wojak Lora to generate a fitting image. JSON extractor + Canvas Block ties it all together.
@CoffeeVectors made this dialog demo with Claude 3.5 in Python using animations from Hedra Labs, voice and sfx from ElevenLabs, and music from Udio.
@emmacatnip has been evoking the nostalgia of a summer we never knew. Created with AnimateDiff and music conjured from Suno.
Gen-3 Alpha isnโt here yet, but @em_golden teased us a bit more with this approaching tornado. Definitely appropriate for the current weather in Switzerland.
@NathanBoey shared a short video showcasing Luma AIโs new frame to frame video interpolation feature.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it ๐โค๏ธ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐ )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
โ dreamingtulpa