AI Art Weekly #97

Hello there, my fellow dreamers, and welcome to issue #97 of AI Art Weekly! 👋

Went through 150+ papers this week and especially DrawingSpinUp, SPARK, TextBoost and InstantDrag look super interesting. The Audio section is also 🔥 this week, but unfortunately all the papers in there have no code. Which is getting a bit tiresome as of late and is why I’m working on a notification service to keep track of them. More on that soon!

I’m also in the process of setting up a print shop for my art, starting with selected pieces from my GN dreamers collection. If you’re interested in a specific piece to hang onto your wall (or wearing it as a shirt), let me know and I’ll add it to the shop.


Cover Challenge 🎨

Theme: rebirth
32 submissions by 19 artists
AI Art Weekly Cover Art Challenge rebirth submission by AleRVG
🏆 1st: @AleRVG
AI Art Weekly Cover Art Challenge rebirth submission by VirginiaLori
🥈 2nd: @VirginiaLori
AI Art Weekly Cover Art Challenge rebirth submission by WorldEverett
🥉 3rd: @WorldEverett
AI Art Weekly Cover Art Challenge rebirth submission by PriestessOfDada
🥉 3rd: @PriestessOfDada

News & Papers

Highlights

Gen-3 Alpha Video-to-Video

Runway launched Gen-3 Alpha Video to Video, the latest version of their video style transfer feature. The new version comes with improved fidelity, consistency, motion, and speed compared to previous generations and supports:

  • Videos up to 10 seconds long
  • 16:9 aspect ratio (1280x768) output at 720p
  • Customizable structure transformation
  • Fixed seed for consistent results

Gen-3 Alpha Video-to-Video examples

Runway and Luma Launch Advanced AI Video Generation APIs

Runway and Luma have both introduced new APIs for their AI-powered video generation models, making advanced capabilities more accessible to developers and businesses. While Runway’s API is currently only available to select partners, Luma’s Dream Machine API is in beta and offers v1.6 models with pricing starting at 40 cents per 5s video and offers features like:

  • Text-to-video and image-to-video generation
  • Camera control and video extension
  • Seamless loop creation
  • Variable aspect ratio support

Luma Labs Dream Machine example

3D

DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer

DreamBeast can generate unique 3D animal assets with different parts. It uses a method from Stable Diffusion 3 to quickly create detailed Part-Affinity maps from various camera views, improving quality while saving computing power.

DreamBeast examples

DrawingSpinUp: 3D Animation from Single Character Drawings

DrawingSpinUp can animate 3D characters from a single 2D drawing. It removes unnecessary lines and uses a skeleton-based algorithm to allow characters to spin, jump, and dance.

DrawingSpinUp examples

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

DreamHOI can generate realistic 3D human-object interactions (HOIs) by posing a skinned human model to interact with objects based on text descriptions. It uses text-to-image diffusion models to create diverse interactions without needing large datasets.

DreamHOI examples

ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE

ProbTalk3D can generate 3D facial animations that show different emotions based on audio input! It uses a two-stage VQ-VAE model and the 3DMEAD dataset, allowing for diverse facial expressions and accurate lip-syncing.

ProbTalk3D example

MoRAG – Multi-Fusion Retrieval Augmented Generation for Human Motion

MoRAG can generate and retrieve human motion from text by improving motion diffusion models.

MoRAG examples

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Phidias can generate high-quality 3D assets from text, images, and 3D references. It uses a method called reference-augmented diffusion to improve quality and speed, achieving results in just a few seconds.

Phidias examples

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

Generative Radiance Field Relighting can relight 3D scenes captured under a single light source. It allows for realistic control over light direction and improves the consistency of views, making it suitable for complex scenes with multiple objects.

Generative Radiance Field Relighting example

LT3SD: Latent Trees for 3D Scene Diffusion

LT3SD can generate large-scale 3D scenes using a method that captures both basic shapes and fine details. It allows for flexible output sizes and produces high-quality scenes, even completing missing parts of a scene.

LT3SD example

SPARK: Self-supervised Personalized Real-time Monocular Face Capture

SPARK can create high-quality 3D face avatars from regular videos and track expressions and poses in real time. It improves the accuracy of 3D face reconstructions for tasks like aging, face swapping, and digital makeup.

SPARK examples

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

DualGS can achieve real-time playback of complex 4D human performances while compressing video data by up to 120 times. It enhances video quality in VR by separately representing motion and appearance using skin and joint Gaussians, requiring only about 350KB of storage per frame.

DualGS demo

Image

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

TextBoost can enable one-shot personalization of text-to-image models by fine-tuning the text encoder. It generates diverse images from a single reference image while reducing overfitting and memory needs.

TextBoost example

InstantDrag: Improving Interactivity in Drag-based Image Editing

InstantDrag can edit images quickly using drag instructions without needing masks or text prompts. It learns motion dynamics with a two-network system, allowing for real-time, photo-realistic editing.

InstantDrag example

Video

DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

DreamMover can generate high-quality intermediate images and short videos from image pairs with large motion. It uses a flow estimator based on diffusion models to keep details and ensure consistency between frames and input images.

DreamMover examples

Audio

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.

STA-V2A example

METEOR: Melody-aware Texture-controllable Symbolic Orchestral Music Generation

METEOR can generate orchestral music while allowing control over the texture of the accompaniment. It achieves high-quality music style transfer and lets users adjust melodies and textures at the bar and track levels.

METEOR model visualization: Original music, Piano version, Full orchestra

Learning Source Disentanglement in Neural Audio Codec

SD-Codec can separate and reconstruct audio signals from speech, music, and sound effects using different codebooks for each type. This method improves how we understand audio codecs and gives better control over audio generation while keeping high quality.

Framework of SD-Codec

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Seed-Music can generate high-quality vocal music in multiple languages. Users can control the style and performance through descriptions and audio samples, and it also allows for precise editing of lyrics and melodies in the generated audio.

Seed-Music example

Also interesting

Buttercup” by me.

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying a print of my art from my art shop 🖼️

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa