AI Art Weekly #62
Hello there, my fellow dreamers, and welcome to issue #62 of AI Art Weekly! 👋
The amount of papers per week is still increasing and I’ve skimmed through 196 this week and it’s getting harder and harder to keep up. Luckily for you, I’m here to serve 😉. So, let’s look at the highlights this week:
- Higen-T2V video model
- Marigold: A new state-of-the art depth estimation model
- StyleCrafter generates images and videos based on style references
- MotionCtrl controls camera and object motions in videos
- and 9 other video editing methods
- X-Adapter makes SD1.5 plugins work with SDXL
- AnimateZero - the next AnimateDiff?
- Readout Guidance - the next ControlNet?
- PhotoMaker: Realistic photos from reference images
- Generative Rendering turns mesh animations into videos
- WonderJourney lets you walk through paintings
- Doodle Your 3D turns sketches into 3D models
- AnimatableDreamer extracts motions from videos
- CLIPDrawX generates vector sketches
- AmbiGen creates text amigrams
- and more tutorials, tools and gems!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for “tulpas” (not me, the concept). The reward is $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here. I’m looking forward to your submissions 🙏
News & Papers
Higen-T2V: A New Image-to-Image Translation Model
We’ve seen a lot of video generation models in the past few months, so it gets harder and harder to impressed by the progress. Higen is yet another one, but it’s results are worth sharing! As usual not open-source yet, but worth to keep and eye on.
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
Given one or more style references, StyleCrafter can generate images and videos based on these referenced styles.
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
MotionCtrl is a flexible motion controller that is able to manage both camera and object motions in the generated videos and can be used with VideoCrafter1,AnimateDiff Stable Video Diffusion.
Video editing is going BRRRRR
Usually there is like one or two video editing methods that come out each week that look kind of interesting. With the two above, we had 11 this week! The other 9 all have one interesting aspect about them that I wanted to highlight:
- RAVE: Impressive consistent video editing.
- Drag-A-Video: Drag and drop video editing.
- BIVDiff: ControlNet support.
- VMC, VideoSwap & SAVE: Swap subjects in videos.
- MagicStick: Subject scaling, repositioning and human motion editing.
- AVID: Any-length video inpainting.
- FACTOR: Trajectory and appearance control.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Good news for SD1.5 enthusiasts! X-Adapter enables pretrained models and plugins for Stable Diffusion 1.5 (ControlNet, LoRAs, etc) to work directly with SDXL without any retraining 🥳
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
AnimateZero looks like the next iteration of AnimateDiff. Like AnimateDiff, the method can generate videos from a single image using text prompts and supports video editing, frame interpolation, looped video generation and real image animation.
LivePhoto and DreamVideo are two other methods from this week that can animate images from text prompts.
Readout Guidance: Learning Control from Diffusion Features
Readout Heads look like the next evolution of guided image generation. Similar to ControlNet, they can be used for pose, depth, or edge-guided generations. But compared to ControlNet models, they’re much more lightweight. In the case of SDXL, a readout head requires at most 35MB of space and can be trained with as few as 100 paired examples. Additionally, they also support image manipulation based on drag and identity consistency. Very cool!
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
PhotoMaker can generate realistic human photos from input images and text prompts. It can change attributes of people, like changing hair colour and adding glasses, turn people from artworks like Van Gogh’s self-portrait into realistic photos, or mix identities of multiple people. Super pumped for this one!
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Generative Rendering is able to take an animated, low-fidelity rendered mesh and a text prompt as input and generate a stylized video based on it. The results, while flickery, have something unique to them.
WonderJourney: Going from Anywhere to Everywhere
WonderJourney lets you wander through your favourite paintings, peoms and haikus. The method can generate a sequence of diverse yet coherently connected 3D scenes from a single image or text prompt.
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Doodle Your 3D can turn abstract sketches into precise 3D shapes. The method can even edit shapes by simply editing the sketch. Super cool. Sketch-to-3D-print isn’t that far away now.
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
AnimatableDreamer can create animated 3D objects from text prompts and animate them with motions extracted from monocular videos.
Relightable Gaussian Codec Avatars
Meta showcased Relightable Gaussian Codec Avatars this week. The method can build high-fidelity relightable & animatable head avatars that are able to capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences and supports diverse materials of human heads such as the eyes, skin, and hair in a unified manner. The avatars can be efficiently relit in real-time under both point light and continuous illumination.
The open-source alternative to this would be MonoGaussianAvatar.
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
CLIPDrawX can generate vector sketches from text prompts using simple primitive shapes like circles, straight lines, and semi-circles.
AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model
AmbiGen on the other hand is a new method for generating ambigrams. Ambigrams are calligraphic designs that have different meanings depending on the viewing orientation.
More papers & gems
- Cartoon Segmentation: Instance-guided Cartoon Editing with a Large-scale Dataset
- NeRFiller: Completing Scenes via Generative 3D Inpainting
- Gaussian Grouping: Segment and Edit Anything in 3D Scenes
- Feature 3DGS 🪄: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
- HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
- ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
- FaceStudio: Put Your Face Everywhere in Seconds
My Animate Anyone post took off on X and reached 100M+ views, 15k+ reposts, and countless replies telling me to go kill myself. This is an essay about that outrage.
Pika 1.0 is out in early access for super-collaborators and @Martin_Haerlin created a video showcasing its inpainting abilities. Still waiting for my access 👀
@AIWarper shared a MagicAnimate example with a custom built Blender rig to simulate the DensePose input.
@cumulo_autumn has achieved real-time diffusion at 100fps with SD Turbo, 512x512, batch size 1. Some even managed to achieve ~150fps.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Marigold is a new state-of-the-art image depth estimator model fine-tuned on synthetic data. Google Colab. HuggingFace Demo.
Remember last weeks “Animate Anyone”? MagicAnimate is an alternative to that and which got already open-sourced and you can try for yourself.
@nacho_gorriti_ put together a script to convert videos into DensePose videos which can be used with the MagicAnimate framework.
Kandinsky 3.0 is the latest text-to-image model in the Kandinsky family. It’s able to generate images with a resolution of 1024x1024 and supports inpainting and outpainting.
StableVITON can change what people wear in images based on a reference image.
ViVid-1-to-3 can generate novel views of an object from a single image. Perfect for creating 3D objects with multi-view image models like ImageDream.
I haven’t used this one myself, but from how I interpret it, the library lets you turn images into “2D” Gaussian Splats that could be further processed in 3D. Might be worth to experiment with.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa