AI Art Weekly #80
Hello there, my fellow dreamers, and welcome to issue #80 of AI Art Weekly! 👋
Another wild week in AI is behind us. A few creepy robots, some AI wearables, Meta releasing Llama 3 and 120+ papers later I bring you the latest and greatest from the generative computer vision front!
In this issue:
- 3D: MeshLRM, Video2Game, LoopGaussian, StyleCity, RefFusion, InFusion, REACTO, DG-Mesh
- Motion: in2IN, TeSMo
- Video: Generative AI in Adobe Premiere Pro, VideoGigaGAN, VASA-1, Ctrl-Adapter, FlowSAM
- Image: Stable Diffusion 3 API, Magic Clothing, MoA, StyleBooth, MOWA, IntrinsicAnything, CustomDiffusion360, AniClipart
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For next weeks cover I’m looking for water submissions! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Adobe upgraded their LRM model and now supports high-quality mesh reconstruction. It’s called MeshLRM, requires four input images, and can generate meshes in less than one second. It also supports text-to-3D and single-image-to-3D generations by first generating the 4 base images.
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
These high-quality assets need a home, don’t you think? Video2Game can automatically convert videos of real-world scenes into realistic and interactive game environments. You can play a demo game here.
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
LoopGaussian can convert multi-view images of a stationary scene into authentic 3D cinemagraphs. The 3D cinemagraphs can be rendered from a novel viewpoint to obtain a natural seamless loopable video.
StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization
StyleCity can stylize a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generate a harmonic omnidirectional sky background.
RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting
RefFusion is a 3D scene inpainting method that can insert objects into a scene and outpaint it to complete it.
InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
InFusion on the other hand can inpaint 3D Gaussian point clouds and meshes into a scene. This can be used for texture editing, object insertion, and object completion through editing a single image.
REACTO: Reconstructing Articulated Objects from a Single Video
REACTO can reconstruct articulated 3D objects by capturing the motion and shape of objects with flexible deformation from a single video.
DG-Mesh: Consistent Mesh Reconstruction from Monocular Videos
DG-Mesh is able to reconstruct high-quality and time-consistent 3D meshes from a single video. The method is also able to track the mesh vertices over time, which enables texture editing on dynamic objects.
Motion
in2IN: Leveraging individual Information to Generate Human INteractions
in2IN is a motion generation model that factors in both the overall interaction’s textual description and individual action descriptions of each person involved. This enhances motion diversity and enables better control over each person’s actions while preserving interaction coherence.
TeSMo: Generating Human Interaction Motions in Scenes with Text Control
TeSMo is a method for text-controlled scene-aware motion generation and is able to generate realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses.
Video
Generative AI in Adobe Premiere Pro
Adobe brings Generative AI capabilities to Adobe Premiere Pro which will bring frame extension, video inpainting and generative B-Rolls to the video editing suite.
Among others, it has been teased that video models like OpenAI’s Sora and Pika will be available for use in Premiere Pro.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
VideoGigaGAN is a new video upscaling model by Adobe that can upsample a video up to 8x with rich details. The results look incredible!
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Microsoft sent the internet into meltdown with VASA-1 this week. A talking-head model that can generate lifelike talking faces in real-time from audio input. Best one we’ve seen so far!
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.
FlowSAM: Moving Object Segmentation
FlowSAM is a new method for video object segmentation that combines the power of SAM with optical flow. The model is able to discover and segment moving objects in a video and outperforms all previous approaches by a considerable margin in both single and multi-object benchmarks.
Image
Stable Diffusion 3 API Now Available
Stable Diffusion 3 and its Turbo version are now available on the Stability AI Developer Platform API.
Weights are not yet available, but are expected to be released soon through their new Stability AI Membership offerings.
Magic Clothing: Controllable Garment-Driven Image Synthesis
Magic Clothing can generate customized characters wearing specific garments from diverse text prompts while preserving the details of the target garments and maintain faithfulness to the text prompts.
Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
MoA is a new architecture for text-to-image personalization. It enables subject swapping in images, morphing between subjects, as well as personalized generation with pose control.
StyleBooth: Image Style Editing with Multimodal Instruction
StyleBooth is a unified style editing method supporting text-based, exemplar-based and compositional style editing. So basically, you can take an image and change its style by either giving it a text prompt or an example image.
MOWA: Multiple-in-One Image Warping Model
MOWA is a multiple-in-one image warping model that can be used for various tasks such as rectangling panoramic images, unrolling shutter images, rotating images, fisheye images, and image retargeting.
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
IntrinsicAnything is able to recover object materials from any images and enable single-view image relighting.
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
CustomDiffusion360 brings camera viewpoint control to text-to-image models. Only caveat: it requires a 360 degree multi-view dataset of around 50 images per object to work.
AniClipart: Clipart Animation withText-to-Video Priors
And last but certainly not least, AniClipart can transform static clipart images into high-quality animations. The method is able to generate iconic and smooth motion by defining Bézier curves over keypoints of the clipart image as a form of motion regularization.
Also interesting
- PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
@diveshnaidoo shared a stunning clip from the VFX app he’s working on. Can you spot which objects on the table in the video are real and which were added with Simulon?
@deforum_art is working on an audio-synchronization module. Looks like it’s coming along nicely!
A HuggingFace space that allows you to upload an image and generate a Magic: The Gathering card caption for it.
A HuggingFace space which lets you stylize any image of a face into the style of a style LoRA.
A HuggingFace space that allows you to generate images with a stylized reference image.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa