Hello there, my fellow dreamers, and welcome to issue #64 of AI Art Weekly! 👋
The end of the year is drawing near, and with it, research is gradually slowing down. Compared to previous weeks, there were only 142 papers 😅
Before we get down to business, I wanted to thank you all for your support over the year. It’s been one hell of a wild ride. I’ll be taking the next week off to rest up for what 2024 holds. I wish you all a wonderful holiday season and a joyful new year! 🎉
Let’s dive in:
- Midjourney v6 alpha released
- VideoPoet turns LLMs into video generators
- GAvatar generates animatable 3D Gaussian avatars
- Align Your Gaussians generates dynamic 4D assets
- VidToMe edits videos with a text prompt
- PIA animates images with a text prompt
- MoSAR turns portrait into relightable 3D avatars
- And HAAR gives them hair
- Paint-it generates texture maps for 3D meshes
- Intrinsic Image Diffusion predicts materials from an image
- Splatter Image creates 3D reconstructions from videos in real-time
- RelightableAvatar turns videos into relightable 3D humans
- DreamTalk can animate faces
- and more tutorials, tools and gems!
Cover Challenge 🎨
News & Papers
Midjourney v6 alpha released
Midjourney released an early version of their v6 model this week. In short:
- The new model has a better prompt understanding (Example)
- Improved coherence and model knowledge (Example)
- Supports drawing some text (Example)
- Two new upscalers with both
First results look very promising, especially the quality photorealism is stunning. Prompt understanding isn’t on par with Dalle 3 yet, but it’s definitely a step in the right direction.
VideoPoet: A large language model for zero-shot video generation
Google revealed that large language models can generate videos. Their simple modeling method, VideoPoet, can convert any autoregressive language model or LLM into a high-quality video generator, capable of generating videos & audio.
Seeing a lot of capabitilies that we’ve explored throughout the year come together in a single multi-modal model is super cool.
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
NVIDIA shared GAvatar this week, a new method that can generate realistic 3D Gaussian splat avatars from text that can be animated. The method can not only generate highly-detailed textured meshes, but can also render them at 100 fps with a 1K resolution.
Align Your Gaussians
Speaking of NVIDIA, they also shared Align Your Gaussians, a new method that can generate dynamic 4D assets from text prompts. It also supports the ability to create looping animations as well as chaining multiple text prompts to create changing animations.
VidToMe: Video Token Merging for Zero-Shot Video Editing
VidToMe can edit videos with a text prompt, custom models and ControlNet guidance and also achieves great temporal consistency. The critical idea in this one is to merge similar tokens across multiple frames in self-attention modules to achieve temporal consistency in generated videos.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
PIA is yet another method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.
MoSAR: Monocular Semi-Supervised Model For Avatar Reconstruction Using Differentiable Shading
MoSAR is able to turn a single portrait image into a relightable 3D avatar with detailed geometry and rich reflectance maps at 4K resolution.
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Now that we got the face, we need some hair. HAAR can generate 3D strand-based human hairstyles from text prompts. The model is able to interpolate between different hair styles, edit and even animate them. Super cool and I can’t wait until tech like this finds its way into the next FromSoftware character creator. You can see it in action here.
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Paint-it can generate high-fidelity physically-based rendering (PBR) texture maps for 3D meshes from a text description. The method is able to relight the mesh by changing High-Dynamic Range (HDR) environmental lighting and control the material properties at test-time.
Intrinsic Image Diffusion for Single-view Material Estimation
A challenge so far when generating 3D objects has been dealing with “baked” textures, which often contain excessive and static shadowing, leading to inaccuracies in dynamic lighting environments. Intrinsic Image Diffusion solves this by predicting materials and generates albedo, roughness, and metallic maps from a single image.
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Splatter Image is an ultra-fast method that can create 3D reconstructions from monocular videos or a single image a frame at a time at 38fps and render them at 588fps. Quality isn’t as high as multi-view methods, but the fact that you can turn a video instantly into a 4D scene is nuts.
Relightable and Animatable Neural Avatars from Videos
RelightableAvatar is another method that can create relightable and animatable neural avatars from monocular video.
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
DreamTalk is able to generate talking heads conditioned on a given text prompt. The model is able to generate talking heads in multiple languages and can also manipulate the speaking style of the generated video.
- pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
- MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
- SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
- Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
- CrossDiff: Realistic Human Motion Generation with Cross-Diffusion Models
- HCBlur: Deep Hybrid Camera Deblurring
This week we’re talking to AI artist QuantumSpirit aka Jen Panepinto.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!