AI Art Weekly #123

Hello, my fellow dreamers, and welcome to issue #123 of AI Art Weekly! 👋

They say it can’t be done, but I’m currently extremely busy “vibe coding” a 60+ table web application for a client. It’s funny how people who haven’t really worked with LLMs think they don’t work, but if you actually know what you’re doing, it’s amazing.

Either way, Midjourney has put out some great updates over the last two weeks: 1) a new --exp parameter and 2) a new omni-reference feature which can be used to reference objects, people, places, you name it. These combined with the Remix feature have me addicted again. We are all very much aware of how social media is frying our attention span, but what I think most people don’t realize is how generative AI is even worse. Going through 1000+ images in an hour cannot be beneficial for our brains long-term.

But that’s a problem for future Tulpa to solve. For now, enjoy the weekend!


News & Papers

3D

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

PrimitiveAnything can generate high-quality 3D shapes by breaking down complex forms into simple geometric parts. It uses a shape-conditioned primitive transformer to ensure that the shapes remain accurate and diverse.

PrimitiveAnything examples

S3D: Sketch-Driven 3D Model Generation

S3D can generate 3D models from simple hand-drawn sketches.

S3D example

Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction

Pixel3DMM can reconstruct 3D human faces from a single RGB image.

Pixel3DMM example

TSTMotion: Training-free Scene-aware Text-to-motion Generation

TSTMotion can generate human motion sequences aware of their surrounding 3D scene from text prompts.

TSTMotion example

GENMO: A GENeralist Model for Human MOtion

GENMO can generate and estimate human motion from text, audio, video, and 3D keyframes. It allows for flexible control of motion outputs.

GENMO example

GarmentDiffusion: 3D Garment Sewing Pattern Generation with Multimodal Diffusion Transformers

GarmentDiffusion can generate precise 3D sewing patterns from text, images, and incomplete designs.

GarmentDiffusion examples

Sketch2Anim: Towards Transferring Sketch Storyboards into 3D Animation

Sketch2Anim can turn 2D storyboard sketches into high-quality 3D animations. It uses a motion generator for precise control and a neural mapper to align 2D sketches with 3D motion, allowing for easy editing and animation control.

Sketch2Anim example

Image

PixelHacker: Image Inpainting with Structural and Semantic Consistency

PixelHacker can perform image inpainting with strong consistency in structure and meaning. It uses a diffusion-based model and a dataset of 14 million image-mask pairs, achieving better results than other methods in texture, shape, and color consistency.

PixelHacker example

CompleteMe: Reference-based Human Image Completion

CompleteMe can complete human images while keeping important details like clothing patterns and accessories from reference images. It uses a dual U-Net architecture with a Region-focused Attention Block to improve visual quality.

CompleteMe comparisons

RepText: Rendering Visual Text via Replicating

RepText can render multilingual visual text in user-chosen fonts without needing to understand the text. It allows for customization of text content, font, and position.

RepText example

Video

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom can generate customized videos with specific subjects while keeping their identity consistent across frames. It supports various inputs like images, audio, video, and text, and it excels in realism and matching text to video.

HunyuanCustom example

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios

FlexiAct can transfer actions from a video to a target image while keeping the person’s identity while adapting to different layouts and viewpoints.

FlexiAct example

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

KeySync can achieve strong lip synchronization for videos. It addresses issues like timing, facial expressions, and blocked faces, using a unique masking strategy and a new metric called LipLeak to improve visual quality.

KeySync example

VAKER: Generating Animated Layouts as Structured Text Representations

VAKER can generate animated layouts for video ads by turning text prompts into detailed plans for visuals and motion.

VAKER example

Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis

Eye2Eye can turn a regular video into a 3D stereo video by creating a left-eye video from a right-eye input. It works well with complex scenes, including those with shiny and clear objects, and allows viewing with 3D glasses or a VR headset.

Eye2Eye example

AnimateAnywhere: Rouse the Background in Human Image Animation

AnimateAnywhere can generate photorealistic human videos with backgrounds that move in sync with human poses.

AnimateAnywhere examples

ShowMak3r: Compositional TV Show Reconstruction

ShowMak3r can reconstruct dynamic radiance fields from TV shows, allowing users to edit scenes like in a control room. It enables actor relocation, insertion, deletion, and pose manipulation while effectively managing occlusions and diverse facial expressions.

ShowMak3r example

Enjoy the weekend dreamhead!

And that my fellow dreamers, concludes yet another AI Art weekly issue. If you like what I do, you can support me by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa