AI Art Weekly #64

Hello there, my fellow dreamers, and welcome to issue #64 of AI Art Weekly! ๐Ÿ‘‹

The end of the year is drawing near, and with it, research is gradually slowing down. Compared to previous weeks, there were only 142 papers ๐Ÿ˜…

Before we get down to business, I wanted to thank you all for your support over the year. Itโ€™s been one hell of a wild ride. Iโ€™ll be taking the next week off to rest up for what 2024 holds. I wish you all a wonderful holiday season and a joyful new year! ๐ŸŽ‰

Letโ€™s dive in:

  • Midjourney v6 alpha released
  • VideoPoet turns LLMs into video generators
  • GAvatar generates animatable 3D Gaussian avatars
  • Align Your Gaussians generates dynamic 4D assets
  • VidToMe edits videos with a text prompt
  • PIA animates images with a text prompt
  • MoSAR turns portrait into relightable 3D avatars
  • And HAAR gives them hair
  • Paint-it generates texture maps for 3D meshes
  • Intrinsic Image Diffusion predicts materials from an image
  • Splatter Image creates 3D reconstructions from videos in real-time
  • RelightableAvatar turns videos into relightable 3D humans
  • DreamTalk can animate faces
  • and more tutorials, tools and gems!

Cover Challenge ๐ŸŽจ

Theme: nisse
74 submissions by 43 artists
AI Art Weekly Cover Art Challenge nisse submission by pactalom
๐Ÿ† 1st: @pactalom
AI Art Weekly Cover Art Challenge nisse submission by AIstronaut42
๐Ÿฅˆ 2nd: @AIstronaut42
AI Art Weekly Cover Art Challenge nisse submission by NrthWestBound
๐Ÿฅ‰ 3rd: @NrthWestBound
AI Art Weekly Cover Art Challenge nisse submission by amorvobiscum
๐Ÿงก 4th: @amorvobiscum

News & Papers

Midjourney v6 alpha released

Midjourney released an early version of their v6 model this week. In short:

  • The new model has a better prompt understanding (Example)
  • Improved coherence and model knowledge (Example)
  • Supports drawing some text (Example)
  • Two new upscalers with both subtle and creative modes

First results look very promising, especially the quality photorealism is stunning. Prompt understanding isnโ€™t on par with Dalle 3 yet, but itโ€™s definitely a step in the right direction.

close up shot of a cool santa clause wearing sunglasses riding the king of reindeers, in the style of game of thrones --v 6.0 --style raw --ar 3:2

VideoPoet: A large language model for zero-shot video generation

Google revealed that large language models can generate videos. Their simple modeling method, VideoPoet, can convert any autoregressive language model or LLM into a high-quality video generator, capable of generating videos & audio.

Seeing a lot of capabitilies that weโ€™ve explored throughout the year come together in a single multi-modal model is super cool.

VideoPoet is capable of multitasking on a variety of video-centric inputs and outputs. The LLM can optionally take text as input to guide generation for text-to-video, image-to-video, stylization, and outpainting tasks.

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

NVIDIA shared GAvatar this week, a new method that can generate realistic 3D Gaussian splat avatars from text that can be animated. The method can not only generate highly-detailed textured meshes, but can also render them at 100 fps with a 1K resolution.

GAvatar examples

Align Your Gaussians

Speaking of NVIDIA, they also shared Align Your Gaussians, a new method that can generate dynamic 4D assets from text prompts. It also supports the ability to create looping animations as well as chaining multiple text prompts to create changing animations.

The nature of 3D Gaussians with deformation fields allow for easy composition of multiple synthesized dynamic 4D assets in larger scenes.

VidToMe: Video Token Merging for Zero-Shot Video Editing

VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.

VidToMe Pixel art style example

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

PIA is yet another method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.

PIA example

MoSAR: Monocular Semi-Supervised Model For Avatar Reconstruction Using Differentiable Shading

MoSAR is able to turn a single portrait image into a relightable 3D avatar with detailed geometry and rich reflectance maps at 4K resolution.

MoSAR examples

HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

Now that we got the face, we need some hair. HAAR can generate 3D strand-based human hairstyles from text prompts. The model is able to interpolate between different hair styles, edit and even animate them. Super cool and I canโ€™t wait until tech like this finds its way into the next FromSoftware character creator. You can see it in action here.

HAAR examples

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Paint-it can generate high-fidelity physically-based rendering (PBR) texture maps for 3D meshes from a text description. The method is able to relight the mesh by changing High-Dynamic Range (HDR) environmental lighting and control the material properties at test-time.

Paint-it examples

Intrinsic Image Diffusion for Single-view Material Estimation

A challenge so far when generating 3D objects has been dealing with โ€œbakedโ€ textures, which often contain excessive and static shadowing, leading to inaccuracies in dynamic lighting environments. Intrinsic Image Diffusion solves this by predicting materials and generates albedo, roughness, and metallic maps from a single image.

Intrinsic Image Diffusion pipeline

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Splatter Image is an ultra-fast method that can create 3D reconstructions from monocular videos or a single image a frame at a time at 38fps and render them at 588fps. Quality isnโ€™t as high as multi-view methods, but the fact that you can turn a video instantly into a 4D scene is nuts.

A frame-by-frame reconstruction of a dancing Paddington Bear in 3D

Relightable and Animatable Neural Avatars from Videos

RelightableAvatar is another method that can create relightable and animatable neural avatars from monocular video.

RelightableAvatar example

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

DreamTalk is able to generate talking heads conditioned on a given text prompt. The model is able to generate talking heads in multiple languages and can also manipulate the speaking style of the generated video.

DreamTalk examples

Also interesting

  • pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
  • MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
  • SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
  • Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
  • CrossDiff: Realistic Human Motion Generation with Cross-Diffusion Models
  • HCBlur: Deep Hybrid Camera Deblurring


This week weโ€™re talking to AI artist QuantumSpirit aka Jen Panepinto.

Tools & Tutorials

These are some of the most interesting resources Iโ€™ve come across this week.

โ€œItโ€™s the Hour of the Cuddle ๐Ÿซ‚โ€ by me collected by @aicollection_.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it ๐Ÿ™โค๏ธ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday ๐Ÿ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

โ€“ dreamingtulpa

by @dreamingtulpa