AI Art Weekly #70

Hello there, my fellow dreamers, and welcome to issue #70 of AI Art Weekly! πŸ‘‹

Was extremely busy this week experimenting with detection and tracking models for Shortie and found a solution that is fast and accurate enough. If things go well, I have an MVP up next week. Wish me luck! 🀞

In the meantime, let’s see what’s new in the world of Generative AI art!

  • Video-LaVIT: a multi-modal LLM that can generate images and videos
  • ConsistI2V generates image-to-video with more consistency
  • Direct-a-Video controls camera movement and object motion for text-to-video
  • Boximator generates rich and controllable motions for image-to-video
  • ConsiStory maintains subject consistency in text-to-image
  • LGM generates high-resolution 3D mesh objects
  • Holo-Gen generates PBR material properties for 3D objects
  • Stability AI has been working on a text-to-speech model
  • EmoSpeaker generates talking-head videos
  • Interview with AI artist blanq
  • and more!

Cover Challenge 🎨

Theme: edo period
107 submissions by 63 artists
AI Art Weekly Cover Art Challenge edo period submission by onchainsherpa
πŸ† 1st: @onchainsherpa
AI Art Weekly Cover Art Challenge edo period submission by xdcp07
πŸ₯ˆ 2nd: @xdcp07
AI Art Weekly Cover Art Challenge edo period submission by skidmarxist1
πŸ₯ˆ 2nd: @skidmarxist1
AI Art Weekly Cover Art Challenge edo period submission by kentelTICE
🧑 4th: @kentelTICE

News & Papers

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.

A 360 shot of a sleek yacht sailing gracefully through the crystal-clear waters of the Caribbean

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

ConsistI2V is an image-to-video method with enhanced visual consistency. Compared to other methods, this one is able to better maintain the subject, background, and style from the first frame, as well as ensure a fluid and logical progression while supporting long video generation as well as camera motion control.

ConsistI2V example

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

In the controllability department we got Direct-a-Video. The framework can individually or jointly control camera movement and object motion in text-to-video generations. This means you can generate a video and tell the model to move the camera from left to right, zoom in or out and move objects around in the scene.

Direct-a-Video example

Boximator: Generating Rich and Controllable Motions for Video Synthesis

As usual, one paper seldom comes alone. Boximator is a method that can generate rich and controllable motions for image-to-video generations by drawing box constraints and motion paths onto the image.

An astronaut is skateboarding on the moon.

ConsiStory: Training-Free Consistent Text-to-Image Generation

First InstantID, then StableIdentity and now ConsiStory, the third paper in 4 weeks that tries to consistent subject identity without fine-tuning. Compared to other methods, ConsiStory is able to successfully follow text prompts while maintaining subject consistency. The model also supports multi-subject scenarios and even enable training-free personalization for common objects.

ConsiStory examples

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

LGM can generate high-resolution 3D mesh objects from text prompts or a single image. The model is able to generate 3D objects within 5 seconds while boosting the training resolution to 512, resulting in high-fidelity and efficient 3D content creation. There is a HuggingFace demo if you want to give it a try. It’s still not good enough to turn my PFP into a 3D model though 😒

LGM Image-to-3D examples

Holo-Gen: Collaborative Control for Geometry-Conditioned PBR Image Generation

Now we got meshes, but what if we want to re-texture them? Unity has published Holo-Gen this week. The method can generate physically-based rendering (PBR) material properties for 3D objects.

Two Holo-Gen IP-Adapter examples

Natural language guidance of high-fidelity text-to-speech models with synthetic annotations

Stability has been researching text-to-speech capabilities that let you control speaker identity and style with natural language text prompts. Their trained model is able to generate high-fidelity speech with a diverse range of accents, prosodic styles, channel conditions, and acoustic conditions. It hasn’t been open-sourced yet, but I’m sure it will at some point.

Text-to-Speech example

EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

EmoSpeaker is yet another talking-head model. This one is able to generate talking-head videos with input audio, emotion, and a source image. It can also generate talking-heads of different emotional intensities by adjusting the fine-grained emotion.

EmoSpeaker example

Also interesting


Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

β€œThe Liaison” by me available on objkt

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa