AI Art Weekly #78

Hello there, my fellow dreamers, and welcome to issue #78 of AI Art Weekly! 👋

This week we had AR contact lenses, flirting with AI, and Microsoft building a Stargate. AI isn’t slowing down my friends, and neither am I. This week I’ve gone through another round of 160+ papers for you so you and I can stay ahead of the curve.

In this issue:

  • Audio: Stable Audio 2.0, SunoAI V3
  • 3D: FlexiDreamer, StructLDM, Design2Cloth, MaGRITTe, CityGaussian, Feature Splatting, Freditor, GeneAvatar, GenN2N, ProbTalk
  • Image: CosmicMan, ID2Reflectance, EdgeDepth, HairFastGAN, SPRIGHT-T2I, LCM-Lookahead, InstantStyle, DreamWalk
  • Video: CameraCtrl, VIDIM, Motion Inversion, DSTA, EDTalk
  • and more!

Cover Challenge 🎨

Theme: character reference
55 submissions by 29 artists
AI Art Weekly Cover Art Challenge character reference submission by NomadsVagabonds
🏆 1st: @NomadsVagabonds
AI Art Weekly Cover Art Challenge character reference submission by onchainsherpa
🥈 2nd: @onchainsherpa
AI Art Weekly Cover Art Challenge character reference submission by drbannerX
🥈 2nd: @drbannerX
AI Art Weekly Cover Art Challenge character reference submission by SandyDamb
🥉 3rd: @SandyDamb

News & Papers

Audio

Stable Audio 2.0

Stability AI released Stable Audio 2.0 this week. It can generate high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1kHz stereo from a single natural language prompt. The new model also introduces audio-to-audio generation, allowing to transform audio samples using text prompts. Pretty cool stuff.

Check YouTube for a full 3 minute example

SunoAI V3

Similar to Stable Audio, Sunov3 lets you create two minute tracks from a single text prompt, but it also supports vocals. I tried it this week and was blown away. Everybody can create their own theme song now. TÚLPA TÚLPA OOOOH OOOOH 🙌.

Check the announcement for audio examples

3D

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

FlexiDreamer is yet another single image-to-3D generation framework. Takes approximately 1 minute on a single NVIDIA A100 GPU.

It’s pepe time 🐸

StructLDM: Structured Latent Diffusion for 3D Human Generation

StructLDM can generate animatable compositional humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc. AI girlfriends/boyfriends are definitely gonna be a thing.

Compositional StructLDM example

Design2Cloth: 3D Cloth Generation from 2D Masks

Design2Cloth on the other hand is a high fidelity 3D generative model that can generate diverse and highly detailed clothes simply by drawing a 2D cloth mask. Even supports interpolation.

Design2Cloth examples

MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text

MaGRITTe can generate 3D scenes from a combination of an image, top-view (floor plans or terrain maps) and a text prompt. Would be super cool to create one of these low-poly game levels with this.

MaGRITTe example

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Speaking about levels, CityGaussian can reconstruct and render large-scale 3D scenes with high-quality and in real-time using Gaussian splatting.

CityGaussian example

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

And talking about Splats, Feature Splatting can manipulate both the appearance and the physical properties of objects in a 3D scene using text prompts.

Feature Splatting example

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

NeRFs aren’t dead yet. Freditor is a method that enables high-fidelity and transferable editing of NeRF scenes.

Freditor examples

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

GeneAvatar is a semantic-driven NeRF editing approach that can be used to edit the geometry and texture of 3D avatars using drag-style, text-prompt, and pattern painting methods.

GeneAvatar demo

GenN2N: Generative NeRF2NeRF Translation

And because methods always come in pairs, GenN2N is another NeRF editing method. This one can edit scenes using text prompts, colorize, upscale and inpaint them.

GenN2N examples

ProbTalk: Towards Variable and Coordinated Holistic Co-Speech Motion Generation

ProbTalk is a method for generating lifelike holistic co-speech motions for 3D avatars. The method is able to generate a wide range of motions and ensures a harmonious alignment among facial expressions, hand gestures, and body poses.

ProbTalk examples, check video for sound.

Image

CosmicMan: A Text-to-Image Foundation Model for Humans

CosmicMan is a new text-to-image foundation model specialized for generating high-fidelity human images.

A full-body shot, an Asian adult female, fit, small road with trees, straight red above-chest hair, normal-length, white and long sleeve cotton shirt, short plaid skirt in pleated shape, cotton backpack, socks, black leather oxford shoes.

ID2Reflectance: Monocular Identity-Conditioned Facial Reflectance Reconstruction

ID2Reflectance can generate high-quality facial reflectance maps from a single image.

ID2Reflectance example

EdgeDepth: Monocular Depth Estimation with Edge-aware Consistency Fusion

EdgeDepth is a new method for monocular depth estimation that relies solely on edge maps as input which results in sharper and detail rich depth maps.

EdgeDepth examples

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

Want to see how you look like with a new hair style? HairFastGAN can transfer hairstyles from a reference image to an input photo for virtual hair try-on.

Jonny getting Leo’s hair with Taylor’s color

SPRIGHT-T2I: Getting it Right

Following spatial instructions in text-to-image prompts is hard! SPRIGHT-T2I can finally do it though, resulting in more coherent and accurate compositions.

Above, a massive, dark storm cloud looms, filling the top half of the image with its ominous presence. Below, a small, winding river flows, while to the right a small house stands alone

LCM-Lookahead for Encoder-based Text-to-Image Personalization

LCM-Lookahead is another attempted LoRA killer with an LCM-based approach for identity transfer in text-to-image generations.

LCM-Lookahead examples

InstantStyle

InstantStyle is yet another text-to-image method to preserve the style of reference images without the need for any additional fine-tuning. HOW MANY MORE?

InstantStyle examples

DreamWalk: Style Space Exploration using Diffusion Guidance

DreamWalk is a new method that can apply different styles to an image generation and interpolate between them. Pretty cool.

DreamWalk example

Video

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Camera control for text-to-video here! CameraCtrl enables accurate camera pose control which allows for the precise control of camera angles and movements when generating videos.

CameraCtrl examples

VIDIM: Video Interpolation With Diffusion Models

Good news: VIDIM is a generative model for video interpolation, which creates short videos given a start and end frame. Bad news: It’s from Google :(

VIDIM example

Motion Inversion for Video Customization

Motion Inversion can be used to customize the motion of videos by matching the motion of a different video.

Motion Inversion example

DSTA: Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

DSTA is a new method for video-based human pose estimation which is able to directly map input to output joint coordinates.

DSTA example

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

And last but not least, EDTalk can generate talking face videos with different mouth shapes, head poses, and expressions from a single image, and can also animate the face directly from audio.

EDTalk examples. Check project page for sound.

Also interesting

Talk to the hand 🤚” by me. Free mint on Zora.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa