Hello there, my fellow dreamers, and welcome to issue #48 of AI Art Weekly! 👋
I’ve got another joke full issue for you this week. There is a lot of progress being made in different areas of AI, which when combined, will enable new kinds of simulations that simply weren’t possible before. But take a look at it yourself below. The highlights this week are:
- MidJourney Inpainting has arrived (finally)
- StableVideo, yet another vid2vid method
- 3D Gaussian Splatting for real-time neural field scene rendering
- InterScene, a framework for simulating human interactions with objects
- Diff2Lip enables new state of the art lip-sync method
- Vid2SFX generates high quality sound effects from images
- Scenimefy turns images and video into anime scenes
- Text2Listen lets AI react to your words
- and more tutorials, tools and gems!
Cover Challenge 🎨
The challenges got a brand new leaderboard this week! Participating in the weekly challenges is not only a great way to improve your skills, now you can also earn virtual points to climp the Top of the AI Art Weekly Cover Challenge Leaderboard for glory and fame 🙌
News & Papers
MidJourney finally released their inpainting feature which they’re calling
Vary (Region). In the same fashion as Stable Diffusion and Photoshop inpainting, the new feature lets you select a section of an image and then rerender that part of the image. I had mixed results depending on what I wanted to achieve, but there is a lot of potential here.
The inpainting feature apparently uses a slightly better model compared to v5.2 in the background. You can misuse that and generate entire images with that model by simply masking the entire image when inpainting, which can lead to some extremely stunning results. I’ve published a tutorial for that on X.
StableVideo is yet another vid2vid method. This one is not just a style transfer though, the method is able to differentiate between fore- and background when editing a video, making it possible to reimagine the subject within an entirely different landscape.
3D Gaussian Splatting for Real-Time Radiance Field Rendering
You might have heard of NeRFs, a method that can turn 2D images or videos into 3D scenes. This new method called 3D Gaussian Splatting is a new way to render these scenes in real-time, while achieving state-of-the-art visual quality and maintaining competitive training times. Impressive.
InterScene: Synthesizing Physically Plausible Human Motions in 3D Scenes
InterScene is a novel framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. Another step closer to completely dynamic game worlds and simulations. Checkout an impressive demo below.
Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization
Diff2Lip is the new state of the art when it comes to lip-sync video with audio (previously Wav2Lip and PC-AVS). Watching dubbed movies or videos with terrible lip-syncing might be soon a thing of the past.
Video-To-Audio SFX: Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
With this unnamed method I call Vid2SFX it’s possible to generate a high quality SFX effect from a single frame of a video, making it super easy to create sound effects for videos.
Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation
Anime lovers rejoice. You might soon be able to convert your regular images into anime scenes with the click of a button. Scenimefy is a new method that can turn a single image or video into a coherent anime scene with different styles. Especially the video stylization is impressive.
Text2Listen: Can Language Models Learn to Listen?
So far communication between AI and humans has mostly been through chat interfaces. As we’ve seen two weeks ago, it won’t be long until we might communicate with AI through real-life looking avatars. Communication doesn’t happen through spoken words only though, body language is another integral part of it. This is something Text2Listen is aiming to solve. The model tries to predict how the face of an avatar should react based on what you’re saying to it. Another puzzle piece for unlocking convincing simulations.
More papers & gems
- Watch Your Steps 👣: Local Image and Scene Editing by Text Instructions
- AvatarJLM: Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling
- NRGA: Leveraging Intrinsic Properties for Non-Rigid Garment Alignment
- MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR
- Few-Arti-Gen: Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
- ReFit: Recurrent Fitting Network for 3D Human Recovery
- ROAM: Robust and Object-aware Motion Generation using Neural Pose Descriptors
- V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
- Point-UV-Diffusion: Texture Generation on 3D Meshes with Point-UV Diffusion
For this weeks AI Art Weekly interview we’re going to embrace the shadows! AI alchemist Vnderworld is known for navigating the dark waters of creativity and challenging Today’s sea of predictable and sanitized artistic expressions. Be warned, his work is not for the faint-hearted, but for everyone else who wants to wander through the uncharted territories of their minds, enjoy!
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!