AI Art Weekly #63

Hello there, my fellow dreamers, and welcome to issue #63 of AI Art Weekly! 👋

Another week, another 187 papers skimmed through. I’m a bit short on time today, so I’m going to keep this intro short. I hope you enjoy this issue and I’ll see you next week! 🙏

The highlights of this week are:

  • Stable Zero123
  • MinD-3D turns brain waves into 3D objects 🧠
  • W.A.L.T: a new photorealistic video generation method
  • Upscale-A-Video: video upscaling with text prompts
  • Peekaboo: bounding box guided video generation
  • Customizing Motion can apply motion patterns from videos
  • Improved temporal consistency with FreeInit
  • DreaMoving: another Animate Anyone approach
  • ASH: 3D human rendering in real time
  • SO-SMPL: Generated disentagled human body and cloth meshes
  • DiffusionLight creates HDR maps for images
  • GMTalker can control facial expressions in videos
  • SMERF can render large photorealistic NeRF scenes in real time
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: tulpas
84 submissions by 47 artists
AI Art Weekly Cover Art Challenge tulpas submission by samisantosai
🏆 1st: @samisantosai
AI Art Weekly Cover Art Challenge tulpas submission by MynimalM
🥈 2nd: @MynimalM
AI Art Weekly Cover Art Challenge tulpas submission by onchainsherpa
🥉 3rd: @onchainsherpa
AI Art Weekly Cover Art Challenge tulpas submission by weird_momma_x
🧡 4th: @weird_momma_x

News & Papers

Stable Zero123: Quality 3D Object Generation from Single Images

Stability AI continues with their model release streak. This week they shared the weights for Stable Zero123, a new image-to-3D model. The quality looks amazing, but there isn’t a Colab nor a HuggingFace space to give this a try. Will this model finally be able to bring me into 3D space?

Stable Zero123 preview

W.A.L.T: Photorealistic Video Generation with Diffusion Models

Holy smokes, this one looks smooth. W.A.L.T is a method for photorealistic video generation for diffusion models. Unfortunately, with everything from Google Research, this one will probably never be open-sourced 😢

A unicorn walking, slow cinematic motion.


Image upscalers have been all the rage lately, but what about video upscaling? Upscale-A-Video is able to take low-resolution videos and text prompts as input and then upscale the video to a higher resolution. The method also allows for texture creation and adjustable noise levels to balance restoration and generation, enabling a trade-off between fidelity and quality.

Upscale-A-Video example

Peekaboo: Interactive Video Generation via Masked-Diffusion

Speaking about video, more research is being conducted on motion control. Peekaboo allows to control the position, size and trajectory of an object very precisely through bounding boxes.

An Eagle flying in the sky motion controlled with Peekaboo

Customizing Motion in Text-to-Video Diffusion Models

On the other hand, Customizing Motion can learn and generalize input motion patterns from input videos and apply them to new and unseen contexts.

a chef, a toddler and elderly people doing the “Carlton dance”

FreeInit: Bridging Initialization Gap in Video Diffusion Models

But what about more temporal consistency? FreeInit got us covered. It improves temporal consistency of videos generated by diffusion models and methods like AnimateDiff. Best thing? It doesn’t require any additional training and has already been open-sourced.

AnimateDiff comparison with FreeInit enabled

DreaMoving: A Human Video Generation Framework based on Diffusion Models

The Animate Anyone saga continues. DreaMoving is yet another approach at generating high-quality videos of humans given a text prompt and some pose guidance. In this case, a reference image is used to preserve facial identity.

DreaMoving example with the text prompt A girl, smiling, dancing in the park with golden leaves in autumn, wearing light blue dress.

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

2D animations are cool, but ASH can basically do the same but in 3D using Gaussian Splats. Given some photorealistic multi-view images of a human and skeletal pose guidance, the method can render humans in real time while preserving details and wrinkles in clothes. Wild.

ASH example

SO-SMPL: Disentangled Clothed Avatar Generation from Text Descriptions

SO-SMPL is also about 3D avatars, but it’s a bit different. This one focuses on generating high-quality separated human body and clothes meshes from text prompts. These disentangled avatar representations achieve much more photorealistic animations compared with other methods.

a chubby bald old man wearing denim work shirt and dirty washed jeans

MinD-3D: Reconstruct High-quality 3D objects in Human Brain

All of the above is cool and all, then again, MinD-3D can reconstruct 3D objects from fMRI brain signals. Not super high-fidelity yet, but if this isn’t the future, I don’t know what is.

MinD-3D examples

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

DiffusionLight can estimate the lighting in a single input image and convert it into an HDR environment map. The technique is able to generate multiple chrome balls with varying exposures for HDR merging and can be used to seamlessly insert 3D objects into an existing photograph. Pretty cool.

DiffusionLight example

GMTalker: Gaussian Mixture based Emotional talking video Portraits

GMTalker can generate high-fidelity talking video portraits with audio-lip sync and control over facial expressions as well as gaze and eye blinks.

GMTalker emotion intensity control example

SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration

And last but not least, SMERF is able to render near-photorealistic NeRF scenes at interactive frame rates. The method is able to handle large scenes with footprints up to 300 m² at a volumetric resolution of 3.5 mm³ and enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. This will make previewing your next flat way more convenient, checkout one of the demos!

SMERF demo

Also interesting

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Stuck in the wrong Century 🏛️” by me collected by @onchainsherpa.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa