Issue #134 | AI Art Weekly

Hello, my fellow dreamers, and welcome to issue #134 of AI Art Weekly! 👋

I’m currently working on my next music video, which will consist of 383 unique scenes (I’m insane, yes). Now, I could create these scenes all by myself, but it’s always more fun to involve the community in a project like this. So if you have a spare minute, please consider submitting an image to this thread 🙏

Currently, 10% of all scenes are community submissions, but the video would be even more amazing if as many people as possible would contribute an image or two 🤗

PROMPTCACHE

Support the newsletter and unlock the full potential of AI-generated art with my curated collection of 375+ high-quality Midjourney SREF codes and 4000+ creative prompts.

promptcache.com

News & Papers

Highlights

Moondream released Moondream 3 Preview, a new tiny SOTA object detection model with powerful visual reasoning.
Alibaba released Qwen-Image-Edit-2509, an improved Qwen-Image-Edit model that supports multi-image editing with ControlNet support.
Google released Mixboard, a canvas based ideation tool with generative AI capabilities (basically dingboard.com).
Robots are still getting the shit kicked out of them, what could possibly go wrong 😅

3D

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Arxiv, Project Page, Code

Lyra can generate 3D scenes from a single image or video. It uses a method that allows real-time rendering and dynamic scene generation without needing multiple views for training.

MeshMosaic: Scaling Artist Mesh Generation via Local-to-Global Assembly

Arxiv, Project Page, Code

MeshMosaic can generate high-resolution 3D meshes with over 100,000 triangles. It breaks shapes into smaller patches for better detail and accuracy, outperforming other methods that usually handle only 8,000 faces.

Video

CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation

Arxiv, Code

CapStARE can achieve high accuracy in gaze estimation. It works in real-time at about 8ms per frame and handles extreme head poses well, making it ideal for interactive systems.

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Arxiv, Project Page, Code

Wan-Animate can animate characters from images by copying their expressions and movements from a video. It also allows for seamless character replacement in videos, keeping the original lighting and color tone for a consistent look.

SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding

Arxiv, Project Page, Code (coming soon)

SynchroRaMa can generate lip-synchronized talking face videos that show emotions. It uses audio-to-motion for natural head movements and provides high image quality and realistic motion, as shown in user studies.

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Arxiv, Project Page, Code (coming soon)

OmniInsert can insert subjects from any video or image into original scenes without masks. It solves problems like data scarcity and subject-scene balance, achieving better results than other methods.

Stable Video-Driven Portraits

Arxiv, Project Page

Stable Video-Driven Portraits can generate high-quality, photo-realistic videos from a single image by reenacting expressions and poses from a driving video.

Audio

AudioLBM: Audio Super-Resolution with Latent Bridge Models

Arxiv, Project Page

Audio Super-Resolution with Latent Bridge Models can upscale low-resolution audio to high-resolution, achieving top quality for 48kHz and setting a record for 192kHz audio.

STAR: Speech-to-Audio Generation via Representation Learning

Arxiv, Project Page, Code

STAR can generate audio from speech input while capturing important sounds and scene details.

An overview of STAR framework and training strategy

Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech

Arxiv, Project Page

Beyond Video-to-SFX can generate synchronized audio with clear speech from videos.

Beyond Video-to-SFX example. Check the project page for audio.

`these violent thoughts have violent delights --chaos 100 --ar 4:3 --exp 100 --raw --stylize 1000 --weird 1000 --p`

And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:

Sharing it 🙏❤️
Following me on Twitter: @dreamingtulpa
Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Buying my Midjourney prompt collection on PROMPTCACHE 🚀
Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

Subscribe to our newsletter

AI Art Weekly #134

News & Papers

Highlights

3D

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

MeshMosaic: Scaling Artist Mesh Generation via Local-to-Global Assembly

Video

CapStARE: Capsule-based Spatiotemporal Architecture for Robust and Efficient Gaze Estimation

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Stable Video-Driven Portraits

Audio

AudioLBM: Audio Super-Resolution with Latent Bridge Models

STAR: Speech-to-Audio Generation via Representation Learning

Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech