AI Art Weekly #109
Hello, my fellow dreamers, and welcome to issue #109 of AI Art Weekly! 👋
Hope you all had a good start to the new year and are ready for the wild rollercoaster that 2025 will most likely be!
But don’t fret; I’ll be here, keeping you up to date with the latest from the world of computer vision.
Let’s jump in.
Support the newsletter and unlock the full potential of AI-generated art with my curated collection of 240+ high-quality Midjourney SREF codes and 1000+ creative prompts.
News & Papers
3D
D3-Human: Dynamic Disentangled Digital Human from Monocular Video
D3-Human can reconstruct detailed 3D human figures from single videos. It separates clothing and body shapes, handles occlusions well, and is useful for clothing transfer and animation.
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
PartGen can generate 3D objects made of meaningful parts from text, images, or 3D models. It allows users to easily manipulate these parts and uses advanced multi-view diffusion models for better 3D asset creation and editing.
ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation
ZeroHSI can synthesize realistic 4D human-scene interactions in various environments using a text prompt.
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Dora can generated 3D assets from images which are ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.
SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control
SCENIC can generate human motion that adapts to complex 3D environments. It allows users to control actions through simple language, like “carefully stepping over obstacles” or “walking upstairs like a zombie,” while ensuring realistic movement and navigation.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
EnergyMoGen can generate complex human motions from text descriptions.
GenHMR: Generative Human Mesh Recovery
GenHMR can generate accurate 3D human mesh models from single images. It effectively handles difficult poses by modeling uncertainties in the 2D-to-3D mapping process, using a pose tokenizer and an image-conditional masked transformer.
DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction
DAS3R can decompose scenes and rebuild static backgrounds from videos.
SewingLDM: Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.
Image
Edicho: Consistent Image Editing in the Wild
Edicho can edit images consistently, even with different poses and lighting. It uses a training-free method based on diffusion models and works well with other tools like ControlNet and BrushNet.
Video
TransPixar: Advancing Text-to-Video Generation with Transparency
TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.
SVFR: A Unified Framework for Generalized Video Face Restoration
SVFR can restore high-quality video faces from low-quality inputs. It combines video face restoration, inpainting, and colorization to improve the overall quality and coherence of the restored videos.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
ConceptMaster can create high-quality customized videos while keeping different concepts separate.
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
Magic Mirror can generate high-quality videos that keep a person’s identity while showing natural motion.
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Diffusion as Shader can generate high-quality videos from 3D tracking inputs.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Through-The-Mask can generate realistic video sequences from static images, text and input masks.
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
GS-DiT can generate videos from a single camera input and allows for advanced video effects like dolly zoom and multi-camera shooting.
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
ILDiff can remove the background from animated stickers/gifs.
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
VideoMaker can generate personalized videos from a single subject reference image.
Audio
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.
Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls
Stable-V2A can generate synchronized sound effects for videos.
And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying access to AI Art Weekly Premium 👑
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa