AI Art Weekly #80

Hello there, my fellow dreamers, and welcome to issue #80 of AI Art Weekly! 👋

Another wild week in AI is behind us. A few creepy robots, some AI wearables, Meta releasing Llama 3 and 120+ papers later I bring you the latest and greatest from the generative computer vision front!

In this issue:

  • 3D: MeshLRM, Video2Game, LoopGaussian, StyleCity, RefFusion, InFusion, REACTO, DG-Mesh
  • Motion: in2IN, TeSMo
  • Video: Generative AI in Adobe Premiere Pro, VideoGigaGAN, VASA-1, Ctrl-Adapter, FlowSAM
  • Image: Stable Diffusion 3 API, Magic Clothing, MoA, StyleBooth, MOWA, IntrinsicAnything, CustomDiffusion360, AniClipart
  • and more!

Cover Challenge 🎨

Theme: hysteria
117 submissions by 74 artists
AI Art Weekly Cover Art Challenge hysteria submission by JewliaSparks
🏆 1st: @JewliaSparks
AI Art Weekly Cover Art Challenge hysteria submission by onchainsherpa
🥈 2nd: @onchainsherpa
AI Art Weekly Cover Art Challenge hysteria submission by demon_ai_
🥉 3rd: @demon_ai_
AI Art Weekly Cover Art Challenge hysteria submission by eathanor
🧡 4th: @eathanor

News & Papers


MeshLRM: Large Reconstruction Model for High-Quality Mesh

Adobe upgraded their LRM model and now supports high-quality mesh reconstruction. It’s called MeshLRM, requires four input images, and can generate meshes in less than one second. It also supports text-to-3D and single-image-to-3D generations by first generating the 4 base images.

MeshLRM examples

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

These high-quality assets need a home, don’t you think? Video2Game can automatically convert videos of real-world scenes into realistic and interactive game environments. You can play a demo game here.

Video2Game garden demo

LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field

LoopGaussian can convert multi-view images of a stationary scene into authentic 3D cinemagraphs. The 3D cinemagraphs can be rendered from a novel viewpoint to obtain a natural seamless loopable video.

LoopGaussian Ficus example

StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

StyleCity can stylize a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generate a harmonic omnidirectional sky background.

Hallucination of magic times of a day for a city.

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

RefFusion is a 3D scene inpainting method that can insert objects into a scene and outpaint it to complete it.

RefFusion inpainting

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

InFusion on the other hand can inpaint 3D Gaussian point clouds and meshes into a scene. This can be used for texture editing, object insertion, and object completion through editing a single image.

Fixing a 3D scene with InFusion

REACTO: Reconstructing Articulated Objects from a Single Video

REACTO can reconstruct articulated 3D objects by capturing the motion and shape of objects with flexible deformation from a single video.

REACTO reconstructions

DG-Mesh: Consistent Mesh Reconstruction from Monocular Videos

DG-Mesh is able to reconstruct high-quality and time-consistent 3D meshes from a single video. The method is also able to track the mesh vertices over time, which enables texture editing on dynamic objects.

DG-Mesh example


in2IN: Leveraging individual Information to Generate Human INteractions

in2IN is a motion generation model that factors in both the overall interaction’s textual description and individual action descriptions of each person involved. This enhances motion diversity and enables better control over each person’s actions while preserving interaction coherence.

in2IN comparison with InterGen

TeSMo: Generating Human Interaction Motions in Scenes with Text Control

TeSMo is a method for text-controlled scene-aware motion generation and is able to generate realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses.

TeSMo examples


Generative AI in Adobe Premiere Pro

Adobe brings Generative AI capabilities to Adobe Premiere Pro which will bring frame extension, video inpainting and generative B-Rolls to the video editing suite.

Among others, it has been teased that video models like OpenAI’s Sora and Pika will be available for use in Premiere Pro.

Adobe Premiere Pro video inpainting

VideoGigaGAN: Towards Detail-rich Video Super-Resolution

VideoGigaGAN is a new video upscaling model by Adobe that can upsample a video up to 8x with rich details. The results look incredible!

VideoGigaGAN example

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Microsoft sent the internet into meltdown with VASA-1 this week. A talking-head model that can generate lifelike talking faces in real-time from audio input. Best one we’ve seen so far!

VASA-1 examples

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.

Ctrl-Adapter overview

FlowSAM: Moving Object Segmentation

FlowSAM is a new method for video object segmentation that combines the power of SAM with optical flow. The model is able to discover and segment moving objects in a video and outperforms all previous approaches by a considerable margin in both single and multi-object benchmarks.

FlowSAM recognizing and segmenting moving turtles


Stable Diffusion 3 API Now Available

Stable Diffusion 3 and its Turbo version are now available on the Stability AI Developer Platform API.

Weights are not yet available, but are expected to be released soon through their new Stability AI Membership offerings.

Awesome artwork of a wizard on the top of a mountain, he's creating the big text 'Stable Diffusion 3 API' with magic, magic text, at dawn, sunrise.

Magic Clothing: Controllable Garment-Driven Image Synthesis

Magic Clothing can generate customized characters wearing specific garments from diverse text prompts while preserving the details of the target garments and maintain faithfulness to the text prompts.

Magic Clothing example

Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

MoA is a new architecture for text-to-image personalization. It enables subject swapping in images, morphing between subjects, as well as personalized generation with pose control.

MoA face swapping

StyleBooth: Image Style Editing with Multimodal Instruction

StyleBooth is a unified style editing method supporting text-based, exemplar-based and compositional style editing. So basically, you can take an image and change its style by either giving it a text prompt or an example image.

StyleBooth examples

MOWA: Multiple-in-One Image Warping Model

MOWA is a multiple-in-one image warping model that can be used for various tasks such as rectangling panoramic images, unrolling shutter images, rotating images, fisheye images, and image retargeting.

MOWA rotation example

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

IntrinsicAnything is able to recover object materials from any images and enable single-view image relighting.

IntrinsicAnything examples

Customizing Text-to-Image Diffusion with Camera Viewpoint Control

CustomDiffusion360 brings camera viewpoint control to text-to-image models. Only caveat: it requires a 360 degree multi-view dataset of around 50 images per object to work.

CustomDiffusion360 examples

AniClipart: Clipart Animation withText-to-Video Priors

And last but certainly not least, AniClipart can transform static clipart images into high-quality animations. The method is able to generate iconic and smooth motion by defining Bézier curves over keypoints of the clipart image as a form of motion regularization.

AniClipart examples

Also interesting

  • PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

The Self-Denied” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa