AI Art Weekly #88

Hello there, my fellow dreamers, and welcome to issue #88 of AI Art Weekly! ๐Ÿ‘‹

Another week, another 180+ papers skimmed through and curated for you.

Iโ€™m currently thinking about how I can add more value to these weekly issues and would love to hear your thoughts! So far Iโ€™ve been focusing on major product updates and computer vision research that could be used or misused for creative output.

If you have any feedback, simply reply to this email and let me know what you think ๐Ÿ™

In this issue:

  • 3D: GaussianDreamerPro, HOIFH, ClotheDreamer, Portrait3D, MIRReS, YouDream, LiveScene, GIC, BRDF-Uncertainty
  • Image: AnyControl, ResMaster
  • Video: MultiDiff, Text-Animator, MotionBooth, Director3D, MoMo, FreeTraj, MVOC, Conditional Image Leakage, Image Conductor
  • Audio: GenAu
  • and more!

Cover Challenge ๐ŸŽจ

Theme: seven deadly sins
38 submissions by 26 artists
AI Art Weekly Cover Art Challenge seven deadly sins submission by daidatep
๐Ÿ† 1st: @daidatep
AI Art Weekly Cover Art Challenge seven deadly sins submission by AleRVG
๐Ÿฅˆ 2nd: @AleRVG
AI Art Weekly Cover Art Challenge seven deadly sins submission by TymothyLongoria
๐Ÿฅ‰ 3rd: @TymothyLongoria
AI Art Weekly Cover Art Challenge seven deadly sins submission by webstark
๐Ÿงก 4th: @webstark

News & Papers

3D

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

GaussianDreamerPro can generate 3D Gaussian assets from text that can be seamlessly integrated into downstream manipulation pipelines, such as animation, composition, and simulation.

GaussianDreamerPro examples

Human-Object Interaction from Human-Level Instructions

HOIFH generates synchronized object motion, full-body human motion, and detailed finger motion. It is designed for manipulating large objects within contextual environments, guided by human-level instructions.

HOIFH setting up a workspace

ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

ClotheDreamer can generate high-fidelity 3D garments from text prompts. The resulting assets can be used for virtual try-on and support physically accurate animation.

ClotheDreamer examples

Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

Portrait3D can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.

Portrait3D examples

MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

MIRReS can reconstruct and optimize the explicit geometry, material, and lighting of objects from multi-view images. The resulting 3D models can be edited and relit in modern graphics engines or CAD software.

MIRReS examples

YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

YouDream can generate high-quality 3D animals from a single image and a text prompt. The method is able to preserve anatomic consistency and is capable of generating and combining commonly found animals.

YouDream examples

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

LiveScene can identify and control multiple objects in complex scenes. It is able to locate individual objects in different states and enables control of them using natural language.

LiveScene example

Gaussian-Informed Continuum for Physical Property Identification and Simulation

GIC can recover 3D objects from Gaussian point sets and simulate their physical properties.

GIC examples

Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis

BRDF-Uncertainty can estimate the properties of the materials on an objectโ€™s surface in seconds given its geometry and a lighting environment.

BRDF-Uncertainty example

Image

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

AnyControl is a new text-to-image guidance method that can generate images from diverse control signals, such as color, shape, texture, and layout.

AnyControl examples

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

ResMaster is a training-free method that enables diffusion models to generate high-quality 4K images with improved structural coherence and more details.

a girl astronaut exploring the cosmos, floating among planets and stars, high quality detail, , anime screencap, studio ghibli style, illustration, high contrast, masterpiece, best quality.

Video

MultiDiff: Consistent Novel View Synthesis from a Single Image

Given a single RGB image and a camera trajectory of choice, MultiDiff can generate new 3D-consistent views from a single input image.

MultiDiff example

Text-Animator: Controllable Visual Text Video Generation

Text-Animator can depict the structures of visual text in generated videos. It supports camera control and text refinement to improve the stability of the generated visual text.

Text-Animator example

MotionBooth: Motion-Aware Customized Text-to-Video Generation

MotionBooth can generate videos of customized subjects from a few images and a text prompt with precise control over both object and camera movements.

MotionBooth examples

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

Director3D can generate real-world 3D scenes and adaptive camera trajectories from text prompts. The method is able to generate pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising.

Director3D example

Disentangled Motion Modeling for Video Frame Interpolation

MoMo is a new video frame interpolation method that is able to generate intermediate frames with high visual quality and reduced computational demands.

Top row: input frames, bottom row: interpolated frames

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

FreeTraj is a tuning-free approach that enables trajectory control in video diffusion models by modifying noise sampling and attention mechanisms.

FreeTraj example

MVOC: a training-free multiple video object composition method with diffusion models

MVOC is a training-free multiple video object composition method with diffusion models. The method can be used to composite multiple video objects into a single video while maintaining motion and identity consistency.

MVOC comparison with other methods

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Conditional Image Leakage can be used to generate videos with more dynamic and natural motion from image prompts.

Conditional Image Leakage example

Image Conductor: Precision Control for Interactive Video Synthesis

Image Conductor can generate video assets from a single image with precise control over camera transitions and object movements.

Image Conductor example

Audio

Taming Data and Transformers for Audio Generation

GenAu is a new scalable transformer-based audio generation architecture by Snapchat that is able to generate high-quality ambient sounds and effects.

The GenAu model

Also interesting

โ€œBlink twiceโ€ by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it ๐Ÿ™โค๏ธ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐Ÿ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

โ€“ dreamingtulpa

by @dreamingtulpa