AI Art Weekly #48

Hello there, my fellow dreamers, and welcome to issue #48 of AI Art Weekly! 👋

I’ve got another joke full issue for you this week. There is a lot of progress being made in different areas of AI, which when combined, will enable new kinds of simulations that simply weren’t possible before. But take a look at it yourself below. The highlights this week are:

  • MidJourney Inpainting has arrived (finally)
  • StableVideo, yet another vid2vid method
  • 3D Gaussian Splatting for real-time neural field scene rendering
  • InterScene, a framework for simulating human interactions with objects
  • Diff2Lip enables new state of the art lip-sync method
  • Vid2SFX generates high quality sound effects from images
  • Scenimefy turns images and video into anime scenes
  • Text2Listen lets AI react to your words
  • and more tutorials, tools and gems!

Cover Challenge 🎨

The challenges got a brand new leaderboard this week! Participating in the weekly challenges is not only a great way to improve your skills, now you can also earn virtual points to climp the Top of the AI Art Weekly Cover Challenge Leaderboard for glory and fame 🙌

Theme: pagans
138 submissions by 74 artists
AI Art Weekly Cover Art Challenge pagans submission by datafog
🏆 1st: @datafog
AI Art Weekly Cover Art Challenge pagans submission by PapaBeardedNFTs
🥈 2nd: @PapaBeardedNFTs
AI Art Weekly Cover Art Challenge pagans submission by Al_Valet
🥈 2nd: @Al_Valet
AI Art Weekly Cover Art Challenge pagans submission by weird_momma_x
🥉 3rd: @weird_momma_x

News & Papers

MidJourney Inpainting

MidJourney finally released their inpainting feature which they’re calling Vary (Region). In the same fashion as Stable Diffusion and Photoshop inpainting, the new feature lets you select a section of an image and then rerender that part of the image. I had mixed results depending on what I wanted to achieve, but there is a lot of potential here.

The inpainting feature apparently uses a slightly better model compared to v5.2 in the background. You can misuse that and generate entire images with that model by simply masking the entire image when inpainting, which can lead to some extremely stunning results. I’ve published a tutorial for that on X.

Midjourney inpainting example


StableVideo is yet another vid2vid method. This one is not just a style transfer though, the method is able to differentiate between fore- and background when editing a video, making it possible to reimagine the subject within an entirely different landscape.

StableVideo example

3D Gaussian Splatting for Real-Time Radiance Field Rendering

You might have heard of NeRFs, a method that can turn 2D images or videos into 3D scenes. This new method called 3D Gaussian Splatting is a new way to render these scenes in real-time, while achieving state-of-the-art visual quality and maintaining competitive training times. Impressive.

3D Gaussian Splatting example

InterScene: Synthesizing Physically Plausible Human Motions in 3D Scenes

InterScene is a novel framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. Another step closer to completely dynamic game worlds and simulations. Checkout an impressive demo below.

InterScene example of a character interacting with objects within a scene and avoiding obstacles

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization

Diff2Lip is the new state of the art when it comes to lip-sync video with audio (previously Wav2Lip and PC-AVS). Watching dubbed movies or videos with terrible lip-syncing might be soon a thing of the past.

Diff2Lip example. Checkout the project page above for examples with audio.

Video-To-Audio SFX: Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

With this unnamed method I call Vid2SFX it’s possible to generate a high quality SFX effect from a single frame of a video, making it super easy to create sound effects for videos.

Vid2SFX preview (check the project page link above for audio examples)

Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation

Anime lovers rejoice. You might soon be able to convert your regular images into anime scenes with the click of a button. Scenimefy is a new method that can turn a single image or video into a coherent anime scene with different styles. Especially the video stylization is impressive.

Scenimefy example

Text2Listen: Can Language Models Learn to Listen?

So far communication between AI and humans has mostly been through chat interfaces. As we’ve seen two weeks ago, it won’t be long until we might communicate with AI through real-life looking avatars. Communication doesn’t happen through spoken words only though, body language is another integral part of it. This is something Text2Listen is aiming to solve. The model tries to predict how the face of an avatar should react based on what you’re saying to it. Another puzzle piece for unlocking convincing simulations.

Text2Listen example. Check the project page above for examples with audio.

More papers & gems

  • Watch Your Steps 👣: Local Image and Scene Editing by Text Instructions
  • AvatarJLM: Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
  • NRGA: Leveraging Intrinsic Properties for Non-Rigid Garment Alignment
  • MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR
  • Few-Arti-Gen: Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation
  • Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
  • ReFit: Recurrent Fitting Network for 3D Human Recovery
  • ROAM: Robust and Object-aware Motion Generation using Neural Pose Descriptors
  • V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
  • Point-UV-Diffusion: Texture Generation on 3D Meshes with Point-UV Diffusion


For this weeks AI Art Weekly interview we’re going to embrace the shadows! AI alchemist Vnderworld is known for navigating the dark waters of creativity and challenging Today’s sea of predictable and sanitized artistic expressions. Be warned, his work is not for the faint-hearted, but for everyone else who wants to wander through the uncharted territories of their minds, enjoy!

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Midjourney inpainting example from our Discord based on the tutorial above. The sign initially spelled “FRAY”. Image by @moelucio.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa