Hello there, my fellow dreamers, and welcome to issue #57 of AI Art Weekly! 👋
We’re approaching the era of generative 3D with high speed and every week there are new announcements and releases that peak my interest. Apart from that, OpenAI dev day is coming up on Monday, can’t wait to see what they have in store for us. Personally I’m hoping for a GPT-4V(ision) API release. But whatever they announce, I’ll cover it in next week’s issue.
Let’s dive into this week’s highlights:
- Midjourney Style Tuner
- Stable 3D Private Preview
- LumaLabs Genie generates 3D models from text in 10 seconds
- SEINE can generate video transitions
- VideoDreamer can generate videos with consistent characters
- ZeroNVS turns a single image into 360-degree scenes
- MM-Vid: GPT-4 vision for videos
- Interview with AI artist Kezia Barnett
- and more tutorials, tools and gems!
Cover Challenge 🎨
News & Papers
Midjourney Style Tuner
Midjourney released a new feature called Style Tuner this week. The new feature lets you modify the style around a specific prompt up to a certain degree.
I was personally hoping for a fine-tuning approach similar to Dreambooth and LoRAs, where we could modify Midjourney with our own dataset of images. But this isn’t the case here. This is a style tuner for your specific prompts! That being said, it’s still fun to play around with and can help you break out from a creative block if MJ has been a bit of a slug for you lately. If you want to get up to speed quickly on it, check out my tutorial on X.
I’m also thinking about putting together a Style Tuner list here on AI Art Weekly. Let me know in case that’s something that would be useful to you.
Stable 3D Private Preview
Stability AI showed a sneakpeek of Stable 3D this week, their answer to the generative 3D race. The pipeline they’re putting together is able to generate concept-quality textured 3D scenes and objects from text-prompts and images in minutes.
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
Until now, video models have usually only produced short clips depicting a single scene. SEINE is a new short-to-long video diffusion model that focuses on generative transitions and predictions. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of clips. The model can also be used for image-to-video animation and autoregressive video prediction.
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning
Character consistency is key when telling a story. VideoDreamer is a framework that is able to generate videos that contain the given subjects and simultaneously conform to text prompts. While nowhere near perfect, we’re not that far away to create videos with multiple consistent characters using only a few input images.
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
ZeroNVS is a 3D-aware diffusion model that is able to generate novel 360-degree views of in-the-wild scenes from a single real image. While the outputs are still nowhere near perfect or rather usable, I’m excited to see where this might lead. Maybe in the next 12 months we’ll be able to create entire dynamic 3D scenes just from a single video 🤞
MM-Vid: Advancing Video Understanding with GPT-4V(ision)
GPT-4V has been around for a few weeks now and enables some super useful new capabilities, I for instance used it to generate a cooking recipe based on an image of ingredients last weekend. MM-Vid harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. This enables MM-Vid to:
- Identify and describe scenes or moments from videos.
- Recognize both animated and real-life content.
- Pinpoint specific timestamps within video content.
- Detect and describe various video elements, such as characters, objects, and actions.
- Answer specific queries related to content within videos by referencing exact timestamps.
- Provide visual navigation suggestions, like instructing a user on where to move within a game.
- Offer brief contextual descriptions for scenes or moments from different genres, including sports, documentaries, TV series, and animations.
While not directly related to AI art, getting access to these capabilities will allow us create differently than ever before.
More papers & gems
- Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior
- Classifier-Score-Distillation: Text-to-3D with Classifier Score Distillation
- CustomNet: Zero-Shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
- VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
- SSR: Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture
- Act As You Wish: Fine-grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs
In todays AI Art Weekly interview we’re talking to Kezia Barnett. KEZIAI is a New Zealand based filmmaker and photographer for over 25 years. After a debilitating head knock left her mostly bedbound, Kezia found solace and renewed purpose in creating with AI in 2021.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buy a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!