AI Art Weekly #123
Hello, my fellow dreamers, and welcome to issue #123 of AI Art Weekly! 👋
They say it can’t be done, but I’m currently extremely busy “vibe coding” a 60+ table web application for a client. It’s funny how people who haven’t really worked with LLMs think they don’t work, but if you actually know what you’re doing, it’s amazing.
Either way, Midjourney has put out some great updates over the last two weeks: 1) a new --exp
parameter and 2) a new omni-reference feature which can be used to reference objects, people, places, you name it. These combined with the Remix feature have me addicted again. We are all very much aware of how social media is frying our attention span, but what I think most people don’t realize is how generative AI is even worse. Going through 1000+ images in an hour cannot be beneficial for our brains long-term.
But that’s a problem for future Tulpa to solve. For now, enjoy the weekend!
Support the newsletter and unlock the full potential of AI-generated art with my curated collection of 275+ high-quality Midjourney SREF codes and 2000+ creative prompts.
News & Papers
3D
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer
PrimitiveAnything can generate high-quality 3D shapes by breaking down complex forms into simple geometric parts. It uses a shape-conditioned primitive transformer to ensure that the shapes remain accurate and diverse.

PrimitiveAnything examples
S3D can generate 3D models from simple hand-drawn sketches.

S3D example
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Pixel3DMM can reconstruct 3D human faces from a single RGB image.

Pixel3DMM example
TSTMotion: Training-free Scene-aware Text-to-motion Generation
TSTMotion can generate human motion sequences aware of their surrounding 3D scene from text prompts.

TSTMotion example
GENMO: A GENeralist Model for Human MOtion
GENMO can generate and estimate human motion from text, audio, video, and 3D keyframes. It allows for flexible control of motion outputs.

GENMO example
GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers
GarmentDiffusion can generate precise 3D sewing patterns from text, images, and incomplete designs.

GarmentDiffusion examples
Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation
Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.

Sketch2Anim example
Image
PixelHacker: Image Inpainting with Structural and Semantic Consistency
PixelHacker can perform image inpainting with strong consistency in structure and meaning. It uses a diffusion-based model and a dataset of 14 million image-mask pairs, achieving better results than other methods in texture, shape, and color consistency.

PixelHacker example
CompleteMe: Reference-based Human Image Completion
CompleteMe can complete human images while keeping important details like clothing patterns and accessories from reference images. It uses a dual U-Net architecture with a Region-focused Attention Block to improve visual quality.

CompleteMe comparisons
RepText: Rendering Visual Text via Replicating
RepText can render multilingual visual text in user-chosen fonts without needing to understand the text. It allows for customization of text content, font, and position.

RepText example
Video
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.

HunyuanCustom example
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
FlexiAct can transfer actions from a video to a target image while keeping the person’s identity while adapting to different layouts and viewpoints.

FlexiAct example
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
KeySync can achieve strong lip synchronization for videos. It addresses issues like timing, facial expressions, and blocked faces, using a unique masking strategy and a new metric called LipLeak to improve visual quality.

KeySync example
VAKER: Generating Animated Layouts as Structured Text Representations
VAKER can generate animated layouts for video ads by turning text prompts into detailed plans for visuals and motion.

VAKER example
Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis
Eye2Eye can turn a regular video into a 3D stereo video by creating a left-eye video from a right-eye input. It works well with complex scenes, including those with shiny and clear objects, and allows viewing with 3D glasses or a VR headset.

Eye2Eye example
AnimateAnywhere: Rouse the Background in Human Image Animation
AnimateAnywhere can generate photorealistic human videos with backgrounds that move in sync with human poses.

AnimateAnywhere examples
ShowMak3r: Compositional TV Show Reconstruction
ShowMak3r can reconstruct dynamic radiance fields from TV shows, allowing users to edit scenes like in a control room. It enables actor relocation, insertion, deletion, and pose manipulation while effectively managing occlusions and diverse facial expressions.

ShowMak3r example

Enjoy the weekend dreamhead!
And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying access to AI Art Weekly Premium 👑
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa