AI Art Weekly #60
Hello there, my fellow dreamers, and welcome to issue #60 of AI Art Weekly! 👋
Right after I shipped last week’s issue, OpenAI went into temporary meltdown mode over the weekend by firing their CEO, Sam Altman.
After a lot of speculation and rumors, the gist seems to be that the reason behind all of it is a new learning algorithm called Q*.
What does that mean? Well, grab onto your tinfoil hat, because this is a wild one.
Q* apparently is able to solve basic math equations, which could unlock new skills like logic, reasoning, and planning. In an allegedly leaked letter from OpenAI, the algorithm was able to decrypt state-of-the-art encryption techniques. If true, this would have devastating effects on cybersecurity. In short: no password or encrypted message would be safe anymore.
But nothing of the above has been confirmed yet. So, take it with a grain of salt.
However, what you shouldn’t take with a grain of salt are this week’s highlights, though; they are numerous and actually happened:
- Stable Video Diffusion by Stability AI
- PhysGaussian can simulate physics for Gaussian Splats
- LiveSketch can animate sketches
- PF-LRM: multi-view to 3D in 1.3 seconds
- LucidDreamer: A text-to-3D framework
- ZipLoRA: Combine any subject in any style
- Concept Sliders: LoRA adaptors for precise control
- DiffusionMat: A new alpha matting method
- MagicDance can clone dance moves
- and more tutorials, tools and gems!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for “movember” submissions, show me your magnificent artificial staches! The reward is $50 and the Challenge Winner for the winner and the Challenge Finalist role for all finalists within our Discord community. These rare roles earn you the exclusive right to cast a vote in the selection of future winners. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
Stable Video Diffusion
Stability AI dropped a new foundation model for video generation called Stable Video Diffusion this week. Summarized, this new model is able to:
- Text-to-Video
- Image-to-Video
- 14 or 25 frames at 576 x 1024
- Multi-View Generation
- Frame Interpolation
- 3D Scene Understanding
- Camera Control via LoRA
The weights for the model are available on HuggingFace here and here.
You can give the model a try on Replicate, Decoherence, ComfyUI and Google Colab.
Apparently this was only the first of five releases by Stability AI. Excited to see what else they have in store for us in the upcoming weeks.
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
This one blows my mind. PhysGaussian is a simulation-rendering pipeline that can simulate the physics of 3D Gaussian Splats while simultaneously render photorealistic results. The method supports flexible dynamics, a diverse range of materials as well as collisions.
LiveSketch: Breathing Life Into Sketches Using Text-to-Video Priors
LiveSketch is my highlight of the week. The method can automatically add motion to a single-subject sketch by providing a text prompt indicating the desired motion. The output are short SVG animations which can be easily edited.
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
Two weeks following Adobe’s announcement of their image-to-3D LRM method, they released a new paper on PF-LRM. Unlike LRM, PF-LRM can create 3D models from a few unrelated images with minimal visual similarity within about 1.3 seconds on a single A100 GPU.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Of course, there is more 3D generation magic. LucidDreamer is a new text-to-3D generation framework that is able to generate 3D models with high-quality textures and shapes. Higher quality means longer inference. This one takes 35 minutes on an A100 GPU.
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
ZipLoRA is a cost-effective method that is able to combine different style and subject LoRAs to create images in any chosen style with any chosen subject. It can stylize objects in various ways and place them in different contexts, maintaining high-quality and controlled stylization.
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Concept Sliders is a new method that allows for fine-grained control over textual and visual attributes in Stable Diffusion XL. By using simple text descriptions or a small set of paired images, artists can train concept sliders to represent the direction of desired attributes. At generation time, these sliders can be used to control the strength of the concept in the image, enabling nuanced tweaking.
DiffusionMat: Alpha Matting as Sequential Refinement Learning
DiffusionMat is a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. The key innovation of the framework is a correction module that adjusts the output at each denoising step, ensuring that the final result is consistent with the input image’s structures.
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
It’s been a while since I last doomed the TikTok dancers. MagicDance is gonna doom them some more. This model can combine human motion with reference images to precisely generate appearance-consistent videos. While the results still contain visible artifacts and jittering, give it a few months and I’m sure we can’t tell the difference no more.
More papers & gems
- PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF
- XAGen: 3D Expressive Human Avatars Generation
- Kandinsky Video
- BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos
- HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis
- MoVideo: Motion-Aware Video Generation with Diffusion Models
@NewMediaPioneer create an AI video scored with musical toys and tapes. Made with Midjourney, Gen-2, Resolve and Topaz. Love it!
@CoffeeVectors shared his workflow on how he transformed a music video using AnimateDiff, multi ControlNet within ComfyUI.
Interview
This week we’re talking to AI animator & illustrator sleepysleephead.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
LucidDreamer is a text-to-3D generation framework to distill high-fidelity textures and shapes from pretrained 2D diffusion models.
DEUS is a super flexible REALTIME image generation engine, powered by StableDiffusion and LCM Lora.
If you’re looking for ComfyUI workflows, Comfy Workflows is the place for you. Just copy & drop any image into ComfyUI to load its workflow.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa