AI Art Weekly #92
Hello there, my fellow dreamers, and welcome to issue #92 of AI Art Weekly! 👋
I’ve been getting my hands on GPT-4o mini this week and I’m positively excited about it. The strength of Claude 3.5 Sonnet combined with GPT-4o mini’s cheap pricing and 128k context window enable a lot of new possibilities. I’m currently working on some new ideas because of it, will share once there is something to show. OpenAI also announced SearchGPT yesterday, basically a new GPT powered search engine that connects to real-time data sources. But it’s OpenAI, so, you know, this could release in 2025 or 2030, who knows 🤷♂️
In this issue:
- 3D: SV4D, Temporal Residual Jacobians, DreamDissector, DreamCar, HoloDreamer, SGIA, 3D Gaussian Parametric Head Model, SparseCraft, TRG
- Image: ViPer, Artist, PartGLEE, OutfitAnyone, Text2Place, Stable-Hair
- Video: Cinemo, HumanVid, MovieDreamer
- Audio: Stable Audio Open, MusiConGen
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For the next cover I’m looking for distortion submissions! Reward is again fame & glory and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Stability released SV4D. A new model that can generate novel view videos and 4D objects from a single reference video.
Temporal Residual Jacobians For Rig-free Motion Transfer
Temporal Residual Jacobians can transfer motion from one 3D mesh to another without requiring rigging or intermediate shape keyframes. This method utilizes two coupled neural networks to predict local geometric and temporal changes, enabling realistic motion transfer across diverse and unseen body shapes.
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
DreamDissector can generate multiple independent textured meshes with plausible interactions from a multi-object text-to-3D NeRF input. This enables applications like text-guided texturing and geometry editing.
DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction
DreamCar can reconstruct 3D car models from just a few images or single-image inputs. It uses Score Distillation Sampling and pose optimization to enhance texture alignment and overall model quality, significantly outperforming existing methods.
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
HoloDreamer can generate enclosed 3D scenes from text descriptions. It does so by first creating a high-quality equirectangular panorama and then rapidly reconstructing the 3D scene using 3D Gaussian Splatting.
Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video
SGIA can efficiently reconstruct relightable dynamic clothed human avatars from a monocular video. The method is able to accurately modeling PBR properties for realistic lighting and pose manipulation.
3D Gaussian Parametric Head Model
3D Gaussian Parametric Head Model can generate high-fidelity 3D human head avatars with precise control over identity and expression. It achieves photo-realistic rendering with real-time efficiency and allows for seamless face portrait interpolation and reconstruction from a single image.
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
SparseCraft can efficiently reconstruct 3D shapes and view-dependent appearances from as few as three colored images. It achieves state-of-the-art performance in novel view synthesis and reconstruction from sparse views, requiring less than 10 minutes for training without any pretrained priors.
6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
TRG can estimate 6DoF head translations and rotations by leveraging the synergy between facial geometry and head pose.
Image
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
ViPer can personalize image generation by capturing individual user preferences through a one-time commenting process on a selection of images. It utilizes these preferences to guide a text-to-image model, resulting in generated images that align closely with users’ visual tastes.
Artist: Aesthetically Controllable Text-Driven Stylization without Training
Artist stylizes images based on text prompts, preserving the original content while producing high aesthetic quality results. No finetuning, no ControlNets, it just works with your pretrained StableDiffusion model.
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE can locate and identify objects and their parts in images. The method uses a unified framework that enables detection, segmentation, and grounding at any granularity.
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
OutfitAnyone can generate ultra-high quality virtual try-on images for any clothing and any person. It effectively handles garment deformation and maintains detail consistency across diverse body shapes and styles, making it suitable for both anime and real-world images.
Text2Place: Affordance-aware Text Guided Human Placement
Text2Place can place any human or object realistically into diverse backgrounds. This enables scene hallucination by generating compatible scenes for the given pose of the human, text-based editing of the human and placing multiple persons into a scene.
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Stable-Hair can robustly transfer a diverse range of real-world hairstyles onto user-provided faces for virtual hair try-on. It employs a two-stage pipeline that includes a Bald Converter for hair removal and specialized modules for high-fidelity hairstyle transfer.
Video
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Cinemo can generate consistent and controllable image animations from static images. It achieves enhanced temporal consistency and smoothness through strategies like learning motion residuals and employing noise refinement techniques, allowing for precise user control over motion intensity.
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
HumanVid can generate videos from a character photo while allowing users to control both human and camera motions. It introduces a large-scale dataset that combines high-quality real-world and synthetic data, achieving state-of-the-art performance in camera-controllable human image animation.
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
MovieDreamer can generate long-duration videos with complex narratives and high visual fidelity. It effectively preserves character identity across scenes and significantly extends the duration of generated content beyond current capabilities.
Audio
Stable Audio Open
Stability open sourced Stable Audio Open. The model is able to generate stereo audio from text with variable-length up to 47 seconds at 44.1kHz.
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
MusiConGen can generate music tracks with precise control over rhythm and chords. It allows users to define musical features through symbolic chord sequences, BPM, and text prompts.
Also interesting
@OnwardsProject shared this neat little workflow idea that lets you animate still product images by feeding visually altered start and endframes into Luma’s DreamMachine.
@CoffeeVectors shared this incredibly creative workflow to lip sync existing video footage with music. In short: Split vocals from music, feed vocals into Hedra Labs with a portrat image, feed animated singing head as motion driver to LivePortrait.
Efficient Audio Captioning can generate text captions for audio files.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa