AI Art Weekly #121

Hello, my fellow dreamers, and welcome to issue #121 of AI Art Weekly! πŸ‘‹

It has been a slow week and I don’t have any smart words for you this week either πŸ˜…

So let me just say thank you to all of you for reading along and supporting the newsletter through Promptcache, Premium, or by simply donating a coffee πŸ™πŸ§‘

Enjoy the weekend and happy Easter! See you in two weeks πŸ‘‹


News & Papers

Highlights

Pusa 0.5

Pusa is a new open-source video model that pretty much supports every image/text/video-video generation task one can imagine. It’s based on Mochi1-Preview and costs only $100 to train. The methodology can be readily applied to other leading video diffusion models including Hunyuan Video, Wan2.1, and others though. So pretty excited to see where this is going.

Pusa examples

Midjourney V7 Remix

Midjourney is getting back into a more frequent release cycle. After last weeks V7 release, they now added the Remix feature to the new model as well. Remix has probably been that one feature in V6 that produced the coolest images for me, so personally I’m super excited for this. And it’s always cool to remix images from older models. But be aware that some parameters might give you bad results. Despite them saying they have better prompt coherence, it’s still terrible compared to other newer models like 4o and Imagen 3. David (the Midjourney founder) himself told me you can improve coherence somewhat by telling an LLM to expand this prompt for me in natural english. Also high --stylize and --sref parameters can do more worse than good, but you don’t need them when remixing an image anyway. Enjoy!

Left V4. Right remixed with V7.

3D

GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

GARF can reassemble 3D objects from real-world fractured parts.

GARF example

SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes

SMF can transfer 2D or 3D keypoint animations to full-body mesh animations without needing template meshes or corrective keyframes.

SMF example

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

REWIND can estimate human motion in real-time from first-person videos. It improves motion quality by using a few example poses and can adapt to new movements effectively.

REWIND example

HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

HumanDreamer-X is another method that can create 3D human avatars from a single image.

HumanDreamer-X example

Text

OmniCaptioner: One Captioner to Rule Them All

OmniCaptioner can generate detailed text descriptions for various types of content like images, math formulas, charts, user interfaces, pdfs, videos and more.

OmniCaptioner examples

Image

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

UNO that brings subject transfer and preservation from reference image to FLUX with one single model.

UNO example

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

PosterMaker can generate high-quality product posters by rendering text accurately and keeping the main subject clear.

PosterMaker examples

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization

Comprehensive Relighting can change and match the lighting of people in images and videos from any scene. It uses a pre-trained diffusion model for general image understanding and an unsupervised temporal lighting model to keep lighting consistent across frames.

Comprehensive Relighting example

Video

TTT-Video: One-Minute Video Generation with Test-Time Training

TTT-Video can create coherent one-minute videos from text storyboards. As the title of this paper says, this uses test-time training instead of self-attention layers to be able to produce consistent multi-context scenes, which is quite the achievement. The paper is worth a read.

The first 5 seconds of a TTT-Video example. The entire 1-minute clip is generated based on the text-storyboard at the top of the video.

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

FantasyTalking can generate talking portraits from a single image, making them look realistic with accurate lip movements and facial expressions. It uses a two-step process to align audio and video, allowing users to control how expressions and body motions appear.

FantasyTalking example

ACTalker: Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

ACTalker can generate talking head videos by combining audio and facial motion to control specific facial areas.

ACTalker example

Also interesting

abstract infinity --chaos 34 --ar 62:75 --p

And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying my Midjourney prompt collection on PROMPTCACHE πŸš€
  • Buying access to AI Art Weekly Premium πŸ‘‘

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa