AI Art Weekly #66

Hello there, my fellow dreamers, and welcome to issue #66 of AI Art Weekly! πŸ‘‹

This week, OpenAI introduced their GPT Store, featuring an upcoming revenue program for US creators, while Rabbit unveiled the r1 pocket companion, a new mobile device that, with the aid of Large Action Models (LAM), aims to help you achieve more with fewer apps. Both have been met with considerable hype and skepticism. Meanwhile, reality is shifting, and the line between what is real and fake is becoming increasingly blurred. Let’s dive in:

  • A new text-to-video model by ByteDance (TikTok)
  • ReplaceAnything can, well, replace anything (in images)
  • PALP is a new text-to-image fine-tuning approach by Google
  • Dubbing for Everyone is a new method for visual dubbing
  • FMA-Net can turn blurry, low-quality videos into clear, high-quality ones
  • Audio2Photoreal can generate gesturing photorealistic avatars from sound clips
  • 3 different 3D NeRF scene editing methods
  • SonicVisionLM generates sound effects for silent videos
  • and more!

Cover Challenge 🎨

Theme: mystery
122 submissions by 78 artists
AI Art Weekly Cover Art Challenge mystery submission by pactalom
πŸ† 1st: @pactalom
AI Art Weekly Cover Art Challenge mystery submission by amorvobiscum
πŸ₯ˆ 2nd: @amorvobiscum
AI Art Weekly Cover Art Challenge mystery submission by VikitoruFelipe
πŸ₯‰ 3rd: @VikitoruFelipe
AI Art Weekly Cover Art Challenge mystery submission by NomadsVagabonds
πŸ₯‰ 3rd: @NomadsVagabonds

News & Papers

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

ByteDance (the TikTok company) announced a new text-to-video model called MagicVideo-V2. Their model is able to generate videos with up to 94 frames, resulting in a 1048Γ—1048 resolution video that exhibits both high aesthetic quality and temporal smoothness. Definitely interesting to see where ByteDance is going with this, as they have one of the biggest datasets to train video models.

A beautiful woman, with a pink and platinum-colored ombre mohawk, facing the camera, wearing a composition of bubble wrap, cyberpunk jacket

ReplaceAnything as you want: Ultra-high quality content replacement

ReplaceAnything is an β€œinpainting” framework that can be used for human replacement, clothing replacement, background replacement, and more. The results look crazy good. Code hasn’t been released yet, but there is a demo on HuggingFace.

ReplaceAnything examples

PALP: Prompt Aligned Personalization of Text-to-Image Models

PALP is a new text-to-image fine-tuning approach by Google which focuses on personalization methods for a single prompt. The results compared to other methods look great and it supports art inspired, single-image and multi-subjects personalization.

PALP example

Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors

Dubbing for Everyone is a new method for visual dubbing that is able to generate lip motions of an actor in a video to synchronize with given audio using as little as 4 seconds of data. The method is able to dub any video to any audio without further training and is able to capture person-specific characteristics and reduce visual artifacts.

Comparison of Dubbing for Everyone with other methods. Checkout the project page for examples with audio.

FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

FMA-Net can turn blurry, low-quality videos into clear, high-quality ones by accurately predicting the degradation and restoration processes, considering the movement in the video through advanced learning of motion patterns.

FMA-Net example

Audio2Photoreal: From Audio to Photoreal Embodiment

Audio2Photoreal can generate full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, the model is able to output multiple possibilities of gestural motion for an individual, including face, body, and hands. The results are highly photorealistic avatars that can express crucial nuances in gestures such as sneers and smirks.

Audio2Photoreal example

InseRF and GO-Nerf: Inserting 3D Objects into Neural Radiance Fields

Even though Gaussian Splats have seen a lot of love, NeRFs haven’t been abandoned. This week we got three different NeRF editing papers. The first two are about inpainting. InseRF and GO-NeRF are both methods to insert 3D objects into NeRF scenes.

InseRF example

FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

The third is about style transfering. FPRF is able to stylize large-scale 3D NeRF scenes with multiple reference images without additional optimization while preserving multi-view appearance consistency.

FPRF example

SonicVisionLM: Playing Sound with Vision Language Models

SonicVisionLM can generate sound effects for videos, but compared to other methods, it uses vision language models (VLMs) to identify events within videos and generate sounds that match the video content.

SonicVisionLM pipeline

Also interesting

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

β€πŸπŸπŸβ€ by me available on objkt

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it πŸ™β€οΈ
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday πŸ˜…)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa