AI Art Weekly #78
Hello there, my fellow dreamers, and welcome to issue #78 of AI Art Weekly! 👋
This week we had AR contact lenses, flirting with AI, and Microsoft building a Stargate. AI isn’t slowing down my friends, and neither am I. This week I’ve gone through another round of 160+ papers for you so you and I can stay ahead of the curve.
In this issue:
- Audio: Stable Audio 2.0, SunoAI V3
- 3D: FlexiDreamer, StructLDM, Design2Cloth, MaGRITTe, CityGaussian, Feature Splatting, Freditor, GeneAvatar, GenN2N, ProbTalk
- Image: CosmicMan, ID2Reflectance, EdgeDepth, HairFastGAN, SPRIGHT-T2I, LCM-Lookahead, InstantStyle, DreamWalk
- Video: CameraCtrl, VIDIM, Motion Inversion, DSTA, EDTalk
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for eclipse submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Audio
Stable Audio 2.0
Stability AI released Stable Audio 2.0 this week. It can generate high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1kHz stereo from a single natural language prompt. The new model also introduces audio-to-audio generation, allowing to transform audio samples using text prompts. Pretty cool stuff.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/2tob9emMhJw.gif)
Check YouTube for a full 3 minute example
SunoAI V3
Similar to Stable Audio, Sunov3 lets you create two minute tracks from a single text prompt, but it also supports vocals. I tried it this week and was blown away. Everybody can create their own theme song now. TÚLPA TÚLPA OOOOH OOOOH 🙌.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/suno.gif)
Check the announcement for audio examples
3D
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
FlexiDreamer is yet another single image-to-3D generation framework. Takes approximately 1 minute on a single NVIDIA A100 GPU.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/pepe.gif)
It’s pepe time 🐸
StructLDM: Structured Latent Diffusion for 3D Human Generation
StructLDM can generate animatable compositional humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc. AI girlfriends/boyfriends are definitely gonna be a thing.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/structldm.gif)
Compositional StructLDM example
Design2Cloth: 3D Cloth Generation from 2D Masks
Design2Cloth on the other hand is a high fidelity 3D generative model that can generate diverse and highly detailed clothes simply by drawing a 2D cloth mask. Even supports interpolation.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/design2cloth.jpg)
Design2Cloth examples
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text
MaGRITTe can generate 3D scenes from a combination of an image, top-view (floor plans or terrain maps) and a text prompt. Would be super cool to create one of these low-poly game levels with this.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/magritte2.gif)
MaGRITTe example
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
Speaking about levels, CityGaussian can reconstruct and render large-scale 3D scenes with high-quality and in real-time using Gaussian splatting.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/citygaussian.gif)
CityGaussian example
Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing
And talking about Splats, Feature Splatting can manipulate both the appearance and the physical properties of objects in a 3D scene using text prompts.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/feature-splatting.gif)
Feature Splatting example
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
NeRFs aren’t dead yet. Freditor is a method that enables high-fidelity and transferable editing of NeRF scenes.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/freditor.gif)
Freditor examples
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
GeneAvatar is a semantic-driven NeRF editing approach that can be used to edit the geometry and texture of 3D avatars using drag-style, text-prompt, and pattern painting methods.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/geneavatar.gif)
GeneAvatar demo
GenN2N: Generative NeRF2NeRF Translation
And because methods always come in pairs, GenN2N is another NeRF editing method. This one can edit scenes using text prompts, colorize, upscale and inpaint them.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/genn2n.gif)
GenN2N examples
ProbTalk: Towards Variable and Coordinated Holistic Co-Speech Motion Generation
ProbTalk is a method for generating lifelike holistic co-speech motions for 3D avatars. The method is able to generate a wide range of motions and ensures a harmonious alignment among facial expressions, hand gestures, and body poses.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/proptalk.gif)
ProbTalk examples, check video for sound.
Image
CosmicMan: A Text-to-Image Foundation Model for Humans
CosmicMan is a new text-to-image foundation model specialized for generating high-fidelity human images.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/cosmicman.jpg)
A full-body shot, an Asian adult female, fit, small road with trees, straight red above-chest hair, normal-length, white and long sleeve cotton shirt, short plaid skirt in pleated shape, cotton backpack, socks, black leather oxford shoes.
ID2Reflectance: Monocular Identity-Conditioned Facial Reflectance Reconstruction
ID2Reflectance can generate high-quality facial reflectance maps from a single image.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/id2reflectance.jpg)
ID2Reflectance example
EdgeDepth: Monocular Depth Estimation with Edge-aware Consistency Fusion
EdgeDepth is a new method for monocular depth estimation that relies solely on edge maps as input which results in sharper and detail rich depth maps.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/edgedepth.jpg)
EdgeDepth examples
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach
Want to see how you look like with a new hair style? HairFastGAN can transfer hairstyles from a reference image to an input photo for virtual hair try-on.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/hairfastgan.jpg)
Jonny getting Leo’s hair with Taylor’s color
SPRIGHT-T2I: Getting it Right
Following spatial instructions in text-to-image prompts is hard! SPRIGHT-T2I can finally do it though, resulting in more coherent and accurate compositions.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/spright-t2i.jpg)
Above, a massive, dark storm cloud looms, filling the top half of the image with its ominous presence. Below, a small, winding river flows, while to the right a small house stands alone
LCM-Lookahead for Encoder-based Text-to-Image Personalization
LCM-Lookahead is another attempted LoRA killer with an LCM-based approach for identity transfer in text-to-image generations.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/lcm-lookahead.jpg)
LCM-Lookahead examples
InstantStyle
InstantStyle is yet another text-to-image method to preserve the style of reference images without the need for any additional fine-tuning. HOW MANY MORE?
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/instantstyle.jpg)
InstantStyle examples
DreamWalk: Style Space Exploration using Diffusion Guidance
DreamWalk is a new method that can apply different styles to an image generation and interpolate between them. Pretty cool.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/dreamwalk.gif)
DreamWalk example
Video
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Camera control for text-to-video here! CameraCtrl enables accurate camera pose control which allows for the precise control of camera angles and movements when generating videos.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/cameractrl.gif)
CameraCtrl examples
VIDIM: Video Interpolation With Diffusion Models
Good news: VIDIM is a generative model for video interpolation, which creates short videos given a start and end frame. Bad news: It’s from Google :(
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/vidim.gif)
VIDIM example
Motion Inversion for Video Customization
Motion Inversion can be used to customize the motion of videos by matching the motion of a different video.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/motion-inversion.gif)
Motion Inversion example
DSTA: Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
DSTA is a new method for video-based human pose estimation which is able to directly map input to output joint coordinates.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/dsta.gif)
DSTA example
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
And last but not least, EDTalk can generate talking face videos with different mouth shapes, head poses, and expressions from a single image, and can also animate the face directly from audio.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/edtalk.gif)
EDTalk examples. Check project page for sound.
Also interesting
- Sketch-to-Architecture: Generative AI-aided Architectural Design
- SOLE 🐾: Segment Any 3D Object with Language
The first ever Sora created music video has been released. Made by @guskamp.
At least that’s what the doomers and EU regulators think. Meanwhile, the AI is just vibing.
![](https://fly.storage.tigris.dev/aiartweekly/assets/issues/78/footer.jpg)
“Talk to the hand 🤚” by me. Free mint on Zora.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa