AI Art Weekly #60

Hello there, my fellow dreamers, and welcome to issue #60 of AI Art Weekly! 👋

Right after I shipped last week’s issue, OpenAI went into temporary meltdown mode over the weekend by firing their CEO, Sam Altman.

After a lot of speculation and rumors, the gist seems to be that the reason behind all of it is a new learning algorithm called Q*.

What does that mean? Well, grab onto your tinfoil hat, because this is a wild one.

Q* apparently is able to solve basic math equations, which could unlock new skills like logic, reasoning, and planning. In an allegedly leaked letter from OpenAI, the algorithm was able to decrypt state-of-the-art encryption techniques. If true, this would have devastating effects on cybersecurity. In short: no password or encrypted message would be safe anymore.

But nothing of the above has been confirmed yet. So, take it with a grain of salt.

However, what you shouldn’t take with a grain of salt are this week’s highlights, though; they are numerous and actually happened:

  • Stable Video Diffusion by Stability AI
  • PhysGaussian can simulate physics for Gaussian Splats
  • LiveSketch can animate sketches
  • PF-LRM: multi-view to 3D in 1.3 seconds
  • LucidDreamer: A text-to-3D framework
  • ZipLoRA: Combine any subject in any style
  • Concept Sliders: LoRA adaptors for precise control
  • DiffusionMat: A new alpha matting method
  • MagicDance can clone dance moves
  • and more tutorials, tools and gems!

Cover Challenge 🎨

Theme: pop art
62 submissions by 35 artists
AI Art Weekly Cover Art Challenge pop art submission by beholdthe84
🏆 1st: @beholdthe84
AI Art Weekly Cover Art Challenge pop art submission by onchainsherpa
🥈 2nd: @onchainsherpa
AI Art Weekly Cover Art Challenge pop art submission by elfearsfoxsox
🥈 2nd: @elfearsfoxsox
AI Art Weekly Cover Art Challenge pop art submission by DystopianAir
🥈 2nd: @DystopianAir

News & Papers

Stable Video Diffusion

Stability AI dropped a new foundation model for video generation called Stable Video Diffusion this week. Summarized, this new model is able to:

  • Text-to-Video
  • Image-to-Video
  • 14 or 25 frames at 576 x 1024
  • Multi-View Generation
  • Frame Interpolation
  • 3D Scene Understanding
  • Camera Control via LoRA

The weights for the model are available on HuggingFace here and here.

You can give the model a try on Replicate, Decoherence, ComfyUI and Google Colab.

Apparently this was only the first of five releases by Stability AI. Excited to see what else they have in store for us in the upcoming weeks.

Stable Video Diffusion example

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

This one blows my mind. PhysGaussian is a simulation-rendering pipeline that can simulate the physics of 3D Gaussian Splats while simultaneously render photorealistic results. The method supports flexible dynamics, a diverse range of materials as well as collisions.

PhysGaussian example

LiveSketch: Breathing Life Into Sketches Using Text-to-Video Priors

LiveSketch is my highlight of the week. The method can automatically add motion to a single-subject sketch by providing a text prompt indicating the desired motion. The output are short SVG animations which can be easily edited.

LiveSketch example

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Two weeks following Adobe’s announcement of their image-to-3D LRM method, they released a new paper on PF-LRM. Unlike LRM, PF-LRM can create 3D models from a few unrelated images with minimal visual similarity within about 1.3 seconds on a single A100 GPU.

PF-LRM example

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Of course, there is more 3D generation magic. LucidDreamer is a new text-to-3D generation framework that is able to generate 3D models with high-quality textures and shapes. Higher quality means longer inference. This one takes 35 minutes on an A100 GPU.

LucidDreamer examples

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

ZipLoRA is a cost-effective method that is able to combine different style and subject LoRAs to create images in any chosen style with any chosen subject. It can stylize objects in various ways and place them in different contexts, maintaining high-quality and controlled stylization.

ZipLoRA example

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Concept Sliders is a new method that allows for fine-grained control over textual and visual attributes in Stable Diffusion XL. By using simple text descriptions or a small set of paired images, artists can train concept sliders to represent the direction of desired attributes. At generation time, these sliders can be used to control the strength of the concept in the image, enabling nuanced tweaking.

Concept Sliders example

DiffusionMat: Alpha Matting as Sequential Refinement Learning

DiffusionMat is a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. The key innovation of the framework is a correction module that adjusts the output at each denoising step, ensuring that the final result is consistent with the input image’s structures.

DiffusionMat example

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

It’s been a while since I last doomed the TikTok dancers. MagicDance is gonna doom them some more. This model can combine human motion with reference images to precisely generate appearance-consistent videos. While the results still contain visible artifacts and jittering, give it a few months and I’m sure we can’t tell the difference no more.

MagicDance example. The model can also be used for non-dance videos. Probably a more useful application.

More papers & gems

  • PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF
  • XAGen: 3D Expressive Human Avatars Generation
  • Kandinsky Video
  • BundleMoCap: Efficient, Robust and Smooth Motion Capture from Sparse Multiview Videos
  • HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis
  • MoVideo: Motion-Aware Video Generation with Diffusion Models

Interview

This week we’re talking to AI animator & illustrator sleepysleephead.


Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Feeling fancy 🎩” by me

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa