AI Art Weekly #87
Hello there, my fellow dreamers, and welcome to issue #87 of AI Art Weekly! π
With CVPR 2024 happening this week, itβs usually a quieter period for new papers. However, I still had to skim through over 120 papers for this issue π
Letβs jump in:
- Highlights: Runway Gen-3 is coming, Advanced Midjourney style blending
- 3D: MeshAnything, High-Fidelity Facial Albedo Estimation, GaussianSR, GradeADreamer, Holistic-Motion2D
- 4D: Splatter a Video, 4K4DGen, L4GM, D-NPC
- Image: iCD, CountGen, Glyph-ByT5-v2
- Video: EvTexture, CamTrol
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For the next cover Iβm looking for weird submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Runway Gen-3 Alpha is coming
After last weeks release of Luma AIβs new video generation tool, Runway is now teasing the release of their Gen-3 model.
The new model offers significant improvements in fidelity, consistency, and motion over Gen-2 and according to Runway a step towards building General World Models. And it shows.
The model will support their existing control modes such as Motion Brush and Director Mode as well as upcoming tools for more fine-grained control over structure, style, and motion. Heck, the thing can even do text.
They said access is coming this week, so keep two eyes out for it!
Advanced Midjourney style blending
Midjourney released new advanced options for style references and model personalization blending this week. You can now:
- Blend multiple
--sref
codes together (--sref 123 456
). - Combine style reference image URLs and random codes (
--sref 123 url
). - Assign weights to individual codes or URLs (
--sref 123::2 456::1
). - Blend multiple model personalization codes (
--p ab12ad3 cd34gl
). - Use weighted blending with the same notation (
--p ab12ad3::2 cd34gl::1
).
3D
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
MeshAnything can convert 3D assets in any 3D representation into meshes. This can be used to enhance various 3D asset production methods and significantly improve storage, rendering, and simulation efficiencies.
High-Fidelity Facial Albedo Estimation via Texture Quantization
HiFiAlbedo is a method that can recover high-fidelity facial albedo maps from a single image without the need for captured albedo data.
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
GaussianSR can generate high-quality 3D Gaussians from low-resolution images and is able to render them faster compared to previous methods.
GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion
GradeADreamer is yet another text-to-3D method. This one is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU.
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
Tender can generate diverse and realistic motions from text prompts in 2D space. The results can be used for pose guidance in video generation or be lifted into 3D for character animation.
4D
Splatter a Video: Video Gaussian Representation for Versatile Processing
Splatter a Video can turn a video into a 3D Gaussian representation, allowing for enhanced video tracking, depth prediction, motion and appearance editing, and stereoscopic video generation.
4K4DGen: Panoramic 4D Generation at 4K Resolution
4K4DGen can turn a single panorama image into an immersive 4D environment with 360-degree views at 4K resolution. The method is able to animate the scene and optimize a set of 4D Gaussians using efficient splatting techniques for real-time exploration.
L4GM: Large 4D Gaussian Reconstruction Model
L4GM is a 4D Large Reconstruction Model that can turn a single-view video into an animated 3D object.
D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video
Cyberpunk brain dances are becoming a thing! D-NPC can turn videos into dynamic neural point clouds aka 4D scenes which makes it possible to watch a scene from another perspective.
Image
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
iCD can be used for zero-shot text-guided image editing with diffusion models. The method is able to encode real images into their latent space in only 3-4 inference steps and can then be used to edit the image with a text prompt.
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Diffusion models canβt count, or can they? CountGen can generate the correct number of objects specified in the input prompt while maintaining a natural layout that aligns with the prompt.
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Glyph-ByT5-v2 is a new SDXL model that can generate high-quality visual layouts with text in 10 different languages.
Video
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
EvTexture is a video super-resolution upscaling method that utilizes event signals for texture enhancement for more accurate texture and high-resolution detail recovery.
Training-free Camera Control for Video Generation
CamTrol can produce high-dynamic videos with controllable camera moves. No fine-tuning required.
Also interesting
@david_vipernz went through some Luma Labs videos from the past week and has put together a short simulation of TV channel surfing.
Meet @Uncanny_Harryβs mate Dave, he has a belated fatherβs day message for everyone, sound on please.
@CoffeeVectors tried to make a rudimentary 3D shooter in Python using Claude Sonnet 3.5.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa