AI Art Weekly #106

Hello there, my fellow dreamers, and welcome to issue #106 of AI Art Weekly! 👋

After skimming through 202 papers this week, I’ve an issue fully packed with goodies for you this Friday.

Obviously lots of no-code stuff under there, but luckily there is AI Art Weekly Premium 😉

And because it’s Black Friday, you can use the code BLACKFRIDAY to get 20% off. Same code also works for my Midjourney prompt collection on Promptcache.

Cheers and until next week ✌️


Cover Challenge 🎨

Theme: phenomena
33 submissions by 20 artists
AI Art Weekly Cover Art Challenge phenomena submission by onchainsherpa
🏆 1st: @onchainsherpa
AI Art Weekly Cover Art Challenge phenomena submission by Fab_ElSalvador
🥈 2nd: @Fab_ElSalvador
AI Art Weekly Cover Art Challenge phenomena submission by 0xNoct
🥉 3rd: @0xNoct
AI Art Weekly Cover Art Challenge phenomena submission by inigma_a
🥉 3rd: @inigma_a

News & Papers

Highlights

Runway Frames

After Runway dropped a new video outpainting feature right after I hit “send” last week, they also announced a new image generation model “Frames” this Monday.

In a gist, Frames allows you to maintain stylistic consistency while generating aesthetic variations for your projects. They call this “Worlds” and it looks like an enhanced version of Midjourney style reference mechanic, but with a difference that this one will most likely seamlessly connect to Gen-3 video generation.

A teaser of many different “Worlds”

3D

Material Anything: Generating Materials for Any 3D Object via Diffusion

Material Anything can generate realistic materials for 3D objects, including those without textures. It adapts to different lighting and uses confidence masks to improve material quality, ensuring outputs are ready for UV mapping.

Material Anything example

SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates

SuperMat can quickly break down images of materials into three important maps: albedo, metallic, and roughness. It does this in about 3 seconds while keeping high quality, making it efficient for 3D object material estimation.

SuperMat example

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

SelfSplat can create 3D models from multiple images without needing specific poses. It uses self-supervised methods for depth and pose estimation, resulting in high-quality appearance and geometry from real-world data.

SelfSplat example

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Make-It-Animatable can auto-rig any 3D humanoid model for animation in under one second. It generates high-quality blend weights and bones, and works with various 3D formats, ensuring accuracy even for non-standard skeletons.

Make-It-Animatable example

Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation

Reflect3D can detect 3D reflection symmetry from a single RGB image and improve 3D generation.

Symmetry Strikes Back examples

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

MVGenMaster can generate up to 100 new views from a single image using a multi-view diffusion model.

MVGenMaster example

KinMo: Kinematic-aware Human Motion Understanding and Generation

KinMo can retrieve, generate, and edit human motion based on text descriptions. It breaks down motion into body joint movements, allowing for precise control over local body parts and improving the accuracy of text-motion retrieval.

KinMo example

TEXGen: a Generative Diffusion Model for Mesh Textures

TEXGen can generate high-resolution UV texture maps in texture space using a 700 million parameter diffusion model. It supports text-guided texture inpainting and sparse-view texture completion, making it versatile for creating textures for 3D assets.

TEXGen example

PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

PhysFlow can simulate dynamic interactions in complex scenes. It identifies material types through image queries and enhances realism using video diffusion and a Material Point Method for detailed 4D representations.

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation example

Image

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting

DreamMix is a inpainting method based on the Fooocus model that can add objects from reference images and change their features using text.

DreamMix examples

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Omegance can control detail levels in diffusion-based synthesis using a single parameter, ω. It allows for precise granularity control in generated outputs and enables specific adjustments through spatial masks and denoising schedules.

Omegance example

ID-Patch: Robust ID Association for Group Photo Personalization

ID-Patch can generate personalized group photos by matching faces with specific positions. It reduces problems like identity leakage and visual errors, achieving high accuracy and speed—seven times faster than other methods.

ID-Patch example

PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion

PersonaCraft can generate realistic full-body images of multiple people from a single reference image. It manages occlusions well and allows users to adjust body shapes for more personalized images.

PersonaCraft example

OSDFace: One-Step Diffusion Model for Face Restoration

OSDFace can restore low-quality face images in one step, making it faster than traditional methods. It produces high-quality images while keeping the person’s identity consistent.

OSDFace example

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

Chat2SVG can generate and edit SVG vector graphics from text prompts. It combines Large Language Models and image diffusion models to create detailed SVG templates and allows users to refine them with simple language instructions.

Chat2SVG examples

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Diptych Prompting can generate images of new subjects in specific contexts by treating text-to-image generation as an inpainting task.

Diptych Prompting examples

Stable Flow: Vital Layers for Training-Free Image Editing

Stable Flow can edit images by adding, removing, or changing objects.

Stable Flow example

Video

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

CAT4D can create dynamic 4D scenes from single videos. It uses a multi-view video diffusion model to generate videos from different angles, allowing for strong 4D reconstruction and high-quality images.

CAT4D example

MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

MagicDriveDiT can generate high-resolution street scene videos for self-driving cars.

MagicDriveDiT example

MyTimeMachine: Personalized Facial Age Transformation

MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.

MyTimeMachine example

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation

AnchorCrafter can generate high-quality 2D videos of people interacting with a reference product.

AnchorCrafter examples

I2VControl: Disentangled and Unified Video Motion Synthesis Control

I2VControl can unify multiple motion control tasks when generating videos from images. It breaks videos into individual motion units with separate control signals, allowing for flexible combinations of control types to boost creativity in video generation.

I2VControl examples

Generative Omnimatte: Learning to Decompose Video into Layers

Generative Omnimatte can break down videos into meaningful layers, isolating objects, shadows, and reflections without needing static backgrounds. It uses a video diffusion model for high-quality results and can fill in hidden areas, enhancing video editing options.

Generative Omnimatte example

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Sonic can generate high-quality portrait animations from audio input.

Sonic example

VIRES: Video Instance Repainting with Sketch and Text Guidance

VIRES can repaint, replace, generate, and remove objects in videos using sketches and text.

VIRES example

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.

A dog sitting under a umbrella on a sunny beach

Audio

MultiFoley: Video-Guided Foley Sound Generation with Multimodal Controls

MultiFoley can generate high-quality sound effects for videos using text, audio, and video inputs. It allows users to create both realistic and whimsical sounds, like making a lion’s roar sound like a cat’s meow, and can complete partial soundtracks with full Foley audio.

MultiFoley preview (check project page for an example with audio)

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying my Midjourney prompt collection on PROMPTCACHE 🚀
  • Buying access to AI Art Weekly Premium 👑

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa