AI Art Weekly #73

Hello there, my fellow dreamers, and welcome to issue #73 of AI Art Weekly! 👋

AI research seems like to slowly gain momentum again. It’s the first week of the year in which I’ve skimmed through 100+ papers again. Aside skimming papers, I’ve made good progress on Shortie this week and shared a little sneak peek on X. Learned a lot while building this and have tons of new ideas in the back of my head, just waiting to get out. But for now, lets take a look at this weeks highlights:

  • EMO turns single images into expressive lip-synced videos
  • Google’s Genie can turn images into platformer games
  • Multi-LoRA optimizes image generation with, well, multiple LoRAs
  • While DiffuseKronA tries to avoid using LoRAs
  • TCD is a better LCM
  • VastGaussian reconstructs large scenes as 3D Gaussians
  • GEA reconstructs expressive 3D avatars from monocular video
  • GEM3D is another text-to-3D model with different approach
  • LayoutLearning generates 3D scenes out of multi-objects
  • OHTA generates hand avatars from single images
  • SongComposer and ChatMusician are LLMs that generate music
  • and more!

Cover Challenge 🎨

Theme: steampunk
95 submissions by 57 artists
AI Art Weekly Cover Art Challenge steampunk submission by moon__theater
🏆 1st: @moon__theater
AI Art Weekly Cover Art Challenge steampunk submission by voidobjects
🥈 2nd: @voidobjects
AI Art Weekly Cover Art Challenge steampunk submission by ManoelKhan
🥉 3rd: @ManoelKhan
AI Art Weekly Cover Art Challenge steampunk submission by MaitresseM
🧡 4th: @MaitresseM

News & Papers

Emote Portrait Alive

Lip-syncing just got the Sora treatment. Alibaba’s EMO can turn any image of a face into into a video expressing the words and emotions from any audio file (singing or talking).

Mona Lisa reading Shakespeare

Genie: Generative Interactive Environments

Another mind blowing paper comes from Google this week. They presented Genie, a foundation world model trained on internet videos that can generate an endless variety of playable worlds from synthetic images, photographs, and even sketches. This opens the door to a variety of new ways to generate and step into virtual worlds. Quite wild if you think about it!

Genie examples

Multi-LoRA Composition for Image Generation

Multi-LoRA Composition focuses on the integration of multiple Low-Rank Adaptations (LoRAs) to create highly customized and detailed images. The approach is able to generate images with multiple elements without fine-tuning and without losing detail or image quality.

Multi-LoRA example combining clothes with a character

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

On the other hand, DiffuseKronA is another method that tries to avoid having to use LoRAs and wants to personalize just from input images. This one generates high-quality images with accurate text-image correspondence and improved color distribution from diverse and complex input images and prompts.

DiffuseKronA comparison with LoRA

TCD: Trajectory Consistency Distillation

While LCM and Turbo have unlocked near real-time image diffusion, the quality is still a bit lacking. TCD on the other hand manages to generate images with both clarity and detailed intricacy without compromising on speed.

LCM / TCD comparison with at increasing number of function evaluations (NFEs)

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

VastGaussian is a new 3D Gaussian Splatting method for high-quality reconstruction and real-time rendering of large scenes.

VastGaussian reconstruction of a Campus

GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video

And speaking about Gaussian Splats, GEA is a method that can create expressive 3D avatars with high-fidelity reconstructions of body and hands from a single video based on 3D Gaussians.

GEA example

GEM3D: Generative Medial Abstractions for 3D Shape Synthesis

GEM3D is a new deep, topology-aware generative model of 3D shapes. The method is able to generate diverse and plausible 3D shapes from user-modeled skeletons, making it possible to draw the rough structure of an object and have the model fill in the rest.

GEM3D examples

Layout Learning

LayoutLearning generates 3D scenes from text that are automatically decomposed into objects. This means that given a prompt like a chef rat on a tiny stool cooking a stew, the model will generate a 3D scene with a chef rat, a tiny stool, and a stew as separate objects.

a chef rat on a tiny stool cooking a stew

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

While diffusion models still struggle with generating accurate human hands, OHTA is able to create high-fidelity and drivable hand avatars from just a single image. The hands can also be generated using only text and allow for hand texture and geometry editing.

Hand avatar examples generated from input images

SongComposer and ChatMusician

We all know by now that LLMs are great at solving all sorts of different tasks. Music wasn’t one of them, until now. SongComposer and ChatMusician are two LLMs that are trained on composing music through symbolic or ABC notations. While SongComposer focuses on generating vocals, ChatMusician generates ABC notations that can be used with existing music tools.

Check out the links above for music samples

Also interesting

  • ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
  • FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings
  • OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Reflections of the Mind” by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa