AI Art Weekly #69
Hello there, my fellow dreamers, and welcome to issue #69 of AI Art Weekly! 👋
I’ve been busy working on my new AI powered product this week and can finally reveal… its name 😅 It’s called Shortie and it’s a tool that will help create short-form videos with AI. But before I can reveal more, let’s dive into this week’s Generative AI art news and papers. The highlights of this week are:
- Midjourney’s Niji v6 and Style References
- AnimateLCM for real-time video generation
- Motion-I2V Motion Brush for controllable image-to-video
- VR-GS for interactive 3D Gaussian splats in VR
- Gaussian Splashing for dynamic fluid synthesis
- AToM for text-to-mesh
- Media2Face for co-speech facial animations
- Anything in Any Scene for photorealistic video object insertion
- SEELE for repositioning subjects within an image
- StableIdentity for inserting anybody into any scene
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for edo period inspired submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Midjourney: Niji v6 and Style References
Midjourney released the new Niji v6 model this week. It’s a model that is specifically tuned on Eastern and anime aesthetics. But the more exciting feature are the new Style References. The new --sref <urlA>
option helps to guide the model with reference images to create images with a more consistent style. I’m already addicted again.
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Last year we got real-time diffusion for images, this year we’ll get it for video! AnimateLCM can generate high-fidelity videos with minimal steps. The model also supports image-to-video as well as support for adapters like ControlNet. It’s not available yet, but once it hits, expect way more AI generated video content.
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Being able to iterate fast is one important step in generative AI, another is controllability. Motion-I2V’s framework not only seems to surpass commercial solutions like Pika and RunwayML in image-to-video tasks, but also offers features like Motion Brush, Motion Drag as well as video-to-video with incredible results. The only downside to this? There is no code 😭
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
With Apple’s Vision Pro being released today, creating for 3D becomes more important by the day (if the device can find adoption). But 3D/VR is hard. Luckily for us, there is AI.
VR-GS allows users to interact with 3D Gaussian kernels in VR and can generate realistic dynamic responses and illumination in real-time, making it possible to manipulate objects and scenes with physically plausible results.
Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting
Interaction is one thing, what about liquids? Gaussian Splashing combines position-based dynamics and 3DGS and allows for the simulation of physical interactions of dynamic fluids and solids with Gaussian Splats.
AToM: Amortized Text-to-Mesh using 2D Diffusion
Gaussian Splats are one option, but what about good old 3D meshes? Well, AToM is a new text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Media2Face is able to generate 3D facial animations from speech, audio, text, and image prompts. The model also can control expressions for each frame either with a reference image or text prompt. Wow.
Anything in Any Scene: Photorealistic Video Object Insertion
Anything in Any Scene is a method that can insert objects into videos while maintaining the same level of photorealism as the original footage. The model is able to handle occlusions and lighting conditions and can even generate shadows for the inserted objects 🤯
SEELE: Repositioning The Subject Within Image
SEELE can move around objects within an image. It does so by removing it, inpainting occluded portions and harmonizing the appearance of the repositioned object with the surrounding areas.
StableIdentity: Inserting Anybody into Anywhere at First Sight
StableIdentity is yet another method that can generate diverse customized images in various contexts from a single input image. The cool thing about this method is, that it is able to combine the learned identity with ControlNet and even inject it into video (ModelScope) and 3D (LucidDreamer) generation.
Also interesting
- Geometry Transfer for Stylizing Radiance Fields
- CapHuman: Capture Your Moments in Parallel Universes
- ReplaceAnything3D: Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields
@PuffYachty put together a super cool retro 80s AI TV commercial reel of @CryptoPopPunk “County Fair - AI Nostalgia” NFT collection.
@jennyzhangzt 3D printed this dinosaur model generated by the 3DTopia text-to-3D model!
@DustinHollywood created a 5 minute AI short that consist of 98% AI video art generated from still images. Tools used in the process: Midjourney, Leonardo.AI, Magnific, Adobe, RunwayML, Topaz Labs, Eleven Labs, D-ID and others.
Interview
This week we’re talking to machine learning VJ Vadim Epstein.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
3DTopia is a two-stage text-to-3D generation model. The first stage uses diffusion model to quickly generate candidates. The second stage refines the assets chosen from the first stage.
FreeStyle is a Stable Diffusion XL plugin that can perform style transfer on existing images from text prompts.
JoyTag is a state of the art AI vision model for tagging images, which doesn’t discriminate on NSFW classifications. It uses the Danbooru tagging schema, but works across a wide range of images, from hand drawn to photographic.
@fffiloni put together a HuggingFace space that can generate a video from a reference face and pose.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa