AI Art Weekly #79
Hello there, my fellow dreamers, and welcome to issue #79 of AI Art Weekly! 👋
10:15pm here as I write these lines. Went through another 140+ papers for you this week and found some really cool stuff and two adorable little robot soccer players. It’s late, so I’ll keep this intro short.
In this issue:
- 3D: InstantMesh, InstructHumans, ZeST, MCC-HO, Key2Mesh, SphereHead, TeFF
- Physics: PhysAvatar, NeRF2Physics
- Image: BeyondScene, MuDI, Imagine Colorization, GoodDrag, ControlNet++, PanFusion, MindBridge
- Video: SpaTracker, SGM-VFI
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for hysterical submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Let’s start again with 3D! InstantMesh can create diverse 3D assets within 10 seconds from a single image.
InstructHumans
InstructHumans can edit existing 3D human textures using text prompts. It maintains avatar consistency pretty well and enables easy animation.
ZeST: Zero-Shot Material Transfer from a Single Image
ZeST can change the material of an object in an image to match a material example image. It can also perform multiple material edits in a single image and perform implicit lighting-aware edits on the rendering of a textured mesh.
MCC-HO: Reconstructing Hand-Held Objects in 3D
MCC-HO can reconstruct 3D objects from a single RGB image and an estimated 3D hand. Why might this be useful? Think VR/AR. Tech like this will make it possible to create a digital twin of objects you are holding in your hands so you and others can interact with them in a virtual environment.
Key2Mesh: MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints
Speaking of reconstruction. Key2Mesh is yet another model that takes on 3D human mesh reconstruction, this time by utilizing 2D human pose keypoints as input instead of relying on visual data due to scarcity in image datasets with 3D labels.
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
GANs aren’t dead yet. SphereHead generates stable and high-quality 3D full-head human faces from all angles with significantly fewer artifacts compared to previous methods. Best one I’ve seen so far.
TeFF: Learning 3D-Aware GANs from Unposed Images with Template Feature Field
TeFF is a similar method to SphereHead, this one supports more than just human faces and can reconstruct a 3D object from the 360 view of a single image.
Physics
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
PhysAvatar can turn multi-view videos into high-quality 3D avatars with loose-fitting clothes. The whole thing can be animated and generalizes well to unseen motions and lighting conditions.
NeRF2Physics: Physical Property Understanding from Language-Embedded Feature Fields
NeRF2Physics can predict the physical properties (mass, friction, hardness, thermal conductivity and Young’s modulus) of objects from a collection of images. This makes it possible to simulate the physical behavior of digital twins in a 3D scene.
Image
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
BeyondScene can generate human-centric scenes with a resolution of up to 8K with exceptional text-image correspondence and naturalness using existing pretrained diffusion models.
MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
We’ve seen gazillion text-to-image personalization methods already. MuDI is another one, but it supports multi-subject personalization. This means you can generate images of multiple subjects without identity mixing.
Imagine Colorization
Imagine Colorization leverages pre-trained diffusion models to colorize images while supporting controllable and user-interactive capabilities.
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
We’ve seen image editing by dragging before. GoodDrag brings improvements to stability and image quality to drag editing with diffusion models.
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
ByteDance is working on ControlNet++. It claims to improve controllable image generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls, bringing improvements to conditional controls such as segmentation masks, line-art edges, depth maps, hed edges and canny edges.
PanFusion: Taming Stable Diffusion for Text to 360° Panorama Image Generation
PanFusion can generate 360-degree panorama images from a text prompt. The model is able to integrate additional constraints like room layout for customized panorama outputs.
MindBridge: A Cross-Subject Brain Decoding Framework
In the Minority Report department we have MindBridge this week. It’s another method that can reconstruct images from fMRI signals and can generalize to multiple subjects from only one model.
Video
SpaTracker: Tracking Any 2D Pixels in 3D Space
Until now I’ve only seen pixel trackers on the 2D plane, SpaTracker can track any 2D pixels in 3D space, which allows for better handling of occlusions and out-of-plane rotations.
SGM-VFI: Sparse Global Matching for Video Frame Interpolation with Large Motion
And last but not least, SGM-VFI is a new video frame interpolation method that is able to handle large motion in videos. The method uses sparse global matching to introduce global information into the estimated intermediate frames, resulting in more accurate and detailed output.
Also interesting
- Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior
- UMBRAE: Unified Multimodal Decoding of Brain Signals
- UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
- GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Similar to Suno v3, Udio is an AI music generator that can generate track from text prompts. It supports vocals and can extend clips forward and backwards.
@AIWarper dropped this viral Viggle AI experiment this week. Workflow included.
@Donversationz showcased the future of designing merch in AR in combination with physical paper. Love it!
Magnus Dahl’s shared this interesting article with me in which he explores his thoughts when it comes to the creative process using using tools like Midjourney and Dalle.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa