AI Art Weekly #92

Hello there, my fellow dreamers, and welcome to issue #92 of AI Art Weekly! 👋

I’ve been getting my hands on GPT-4o mini this week and I’m positively excited about it. The strength of Claude 3.5 Sonnet combined with GPT-4o mini’s cheap pricing and 128k context window enable a lot of new possibilities. I’m currently working on some new ideas because of it, will share once there is something to show. OpenAI also announced SearchGPT yesterday, basically a new GPT powered search engine that connects to real-time data sources. But it’s OpenAI, so, you know, this could release in 2025 or 2030, who knows 🤷‍♂️

In this issue:

  • 3D: SV4D, Temporal Residual Jacobians, DreamDissector, DreamCar, HoloDreamer, SGIA, 3D Gaussian Parametric Head Model, SparseCraft, TRG
  • Image: ViPer, Artist, PartGLEE, OutfitAnyone, Text2Place, Stable-Hair
  • Video: Cinemo, HumanVid, MovieDreamer
  • Audio: Stable Audio Open, MusiConGen
  • and more!

Cover Challenge 🎨

Theme: golden hour
44 submissions by 27 artists
AI Art Weekly Cover Art Challenge golden hour submission by onchainsherpa
🏆 1st: @onchainsherpa
AI Art Weekly Cover Art Challenge golden hour submission by moon__theater
🥈 2nd: @moon__theater
AI Art Weekly Cover Art Challenge golden hour submission by EdmundBoissier
🥉 3rd: @EdmundBoissier
AI Art Weekly Cover Art Challenge golden hour submission by weird_momma_x
🧡 4th: @weird_momma_x

News & Papers

3D

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Stability released SV4D. A new model that can generate novel view videos and 4D objects from a single reference video.

SV4D examples

Temporal Residual Jacobians For Rig-free Motion Transfer

Temporal Residual Jacobians can transfer motion from one 3D mesh to another without requiring rigging or intermediate shape keyframes. This method utilizes two coupled neural networks to predict local geometric and temporal changes, enabling realistic motion transfer across diverse and unseen body shapes.

Temporal Residual Jacobians examples

DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

DreamDissector can generate multiple independent textured meshes with plausible interactions from a multi-object text-to-3D NeRF input. This enables applications like text-guided texturing and geometry editing.

DreamDissector examples

DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

DreamCar can reconstruct 3D car models from just a few images or single-image inputs. It uses Score Distillation Sampling and pose optimization to enhance texture alignment and overall model quality, significantly outperforming existing methods.

DreamCar example

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

HoloDreamer can generate enclosed 3D scenes from text descriptions. It does so by first creating a high-quality equirectangular panorama and then rapidly reconstructing the 3D scene using 3D Gaussian Splatting.

HoloDreamer example

Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

SGIA can efficiently reconstruct relightable dynamic clothed human avatars from a monocular video. The method is able to accurately modeling PBR properties for realistic lighting and pose manipulation.

SGIA example

3D Gaussian Parametric Head Model

3D Gaussian Parametric Head Model can generate high-fidelity 3D human head avatars with precise control over identity and expression. It achieves photo-realistic rendering with real-time efficiency and allows for seamless face portrait interpolation and reconstruction from a single image.

3D Gaussian Parametric Head Model examples

SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

SparseCraft can efficiently reconstruct 3D shapes and view-dependent appearances from as few as three colored images. It achieves state-of-the-art performance in novel view synthesis and reconstruction from sparse views, requiring less than 10 minutes for training without any pretrained priors.

SparseCraft example

6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry

TRG can estimate 6DoF head translations and rotations by leveraging the synergy between facial geometry and head pose.

TRG example

Image

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

ViPer can personalize image generation by capturing individual user preferences through a one-time commenting process on a selection of images. It utilizes these preferences to guide a text-to-image model, resulting in generated images that align closely with users’ visual tastes.

ViPer examples

Artist: Aesthetically Controllable Text-Driven Stylization without Training

Artist stylizes images based on text prompts, preserving the original content while producing high aesthetic quality results. No finetuning, no ControlNets, it just works with your pretrained StableDiffusion model.

Artist examples

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

PartGLEE can locate and identify objects and their parts in images. The method uses a unified framework that enables detection, segmentation, and grounding at any granularity.

PartGLEE examples

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

OutfitAnyone can generate ultra-high quality virtual try-on images for any clothing and any person. It effectively handles garment deformation and maintains detail consistency across diverse body shapes and styles, making it suitable for both anime and real-world images.

OutfitAnyone example

Text2Place: Affordance-aware Text Guided Human Placement

Text2Place can place any human or object realistically into diverse backgrounds. This enables scene hallucination by generating compatible scenes for the given pose of the human, text-based editing of the human and placing multiple persons into a scene.

Text2Place example

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Stable-Hair can robustly transfer a diverse range of real-world hairstyles onto user-provided faces for virtual hair try-on. It employs a two-stage pipeline that includes a Bald Converter for hair removal and specialized modules for high-fidelity hairstyle transfer.

Stable-Hair examples

Video

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Cinemo can generate consistent and controllable image animations from static images. It achieves enhanced temporal consistency and smoothness through strategies like learning motion residuals and employing noise refinement techniques, allowing for precise user control over motion intensity.

Cinemo example

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

HumanVid can generate videos from a character photo while allowing users to control both human and camera motions. It introduces a large-scale dataset that combines high-quality real-world and synthetic data, achieving state-of-the-art performance in camera-controllable human image animation.

HumanVid example

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

MovieDreamer can generate long-duration videos with complex narratives and high visual fidelity. It effectively preserves character identity across scenes and significantly extends the duration of generated content beyond current capabilities.

MovieDreamer example

Audio

Stable Audio Open

Stability open sourced Stable Audio Open. The model is able to generate stereo audio from text with variable-length up to 47 seconds at 44.1kHz.

Stable Audio Open paper

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

MusiConGen can generate music tracks with precise control over rhythm and chords. It allows users to define musical features through symbolic chord sequences, BPM, and text prompts.

A laid-back blues shuffle with a relaxed tempo, warm guitar tones, and a comfortable groove, perfect for a slow dance or a night in. Instruments: electric guitar, bass, drums.

Also interesting

“Ambrosius I” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa