AI Art Weekly #109

Hello, my fellow dreamers, and welcome to issue #109 of AI Art Weekly! 👋

Hope you all had a good start to the new year and are ready for the wild rollercoaster that 2025 will most likely be!

But don’t fret; I’ll be here, keeping you up to date with the latest from the world of computer vision.

Let’s jump in.


News & Papers

3D

D3-Human: Dynamic Disentangled Digital Human from Monocular Video

D3-Human can reconstruct detailed 3D human figures from single videos. It separates clothing and body shapes, handles occlusions well, and is useful for clothing transfer and animation.

D3-Human example

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

PartGen can generate 3D objects made of meaningful parts from text, images, or 3D models. It allows users to easily manipulate these parts and uses advanced multi-view diffusion models for better 3D asset creation and editing.

PartGen examples

ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation

ZeroHSI can synthesize realistic 4D human-scene interactions in various environments using a text prompt.

ZeroHSI example

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Dora can generated 3D assets from images which are ready for diffusion-based character control in modern 3D engines, such as Unity 3D, in real-time.

Dora example

SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control

SCENIC can generate human motion that adapts to complex 3D environments. It allows users to control actions through simple language, like “carefully stepping over obstacles” or “walking upstairs like a zombie,” while ensuring realistic movement and navigation.

SCENIC example

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

EnergyMoGen can generate complex human motions from text descriptions.

a person steps forward and puts their hand up near their face

GenHMR: Generative Human Mesh Recovery

GenHMR can generate accurate 3D human mesh models from single images. It effectively handles difficult poses by modeling uncertainties in the 2D-to-3D mapping process, using a pose tokenizer and an image-conditional masked transformer.

GenHMR example

DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction

DAS3R can decompose scenes and rebuild static backgrounds from videos.

DAS3R example

SewingLDM: Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation

SewingLDM can generate complex sewing patterns using text prompts, body shapes, and garment sketches. It allows for detailed customization and significantly improves the design of garments to fit different body types.

SewingLDM example

Image

Edicho: Consistent Image Editing in the Wild

Edicho can edit images consistently, even with different poses and lighting. It uses a training-free method based on diffusion models and works well with other tools like ControlNet and BrushNet.

Edicho examples

Video

TransPixar: Advancing Text-to-Video Generation with Transparency

TransPixar can generate RGBA videos, enabling the creation of transparent elements like smoke and reflections that blend seamlessly into scenes.

TransPixar example

SVFR: A Unified Framework for Generalized Video Face Restoration

SVFR can restore high-quality video faces from low-quality inputs. It combines video face restoration, inpainting, and colorization to improve the overall quality and coherence of the restored videos.

SVFR example

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

DiTCtrl can generate multi-prompt videos with smooth transitions and consistent object motion.

DiTCtrl example

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

ConceptMaster can create high-quality customized videos while keeping different concepts separate.

ConceptMaster comparison with DreamBooth

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

Magic Mirror can generate high-quality videos that keep a person’s identity while showing natural motion.

Magic Mirror example

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Diffusion as Shader can generate high-quality videos from 3D tracking inputs.

Diffusion as Shader examples

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Through-The-Mask can generate realistic video sequences from static images, text and input masks.

Through-The-Mask example

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

GS-DiT can generate videos from a single camera input and allows for advanced video effects like dolly zoom and multi-camera shooting.

GS-DiT example

ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

ILDiff can remove the background from animated stickers/gifs.

ILDiff example

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

VideoMaker can generate personalized videos from a single subject reference image.

VideoMaker example

Audio

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

TangoFlux can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU.

TangoFlux comparison

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

MMAudio can generate high-quality audio that matches video and text inputs. It excels in audio quality and synchronization, with a fast processing time of just 1.23 seconds for an 8-second clip.

Nothing in the video above is real, video has been generated with Meta’s Movie Gen and the audio (which you can’t hear in this gif) was generated with MMAudio.

Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls

Stable-V2A can generate synchronized sound effects for videos.

You can’t here this, but this video would have synchronized foot steps based on the character movement.

I’ve been going nuts with creating dreampunk visuals in December :)

And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa