AI Art Weekly #106
Hello there, my fellow dreamers, and welcome to issue #106 of AI Art Weekly! 👋
After skimming through 202 papers this week, I’ve an issue fully packed with goodies for you this Friday.
Obviously lots of no-code stuff under there, but luckily there is AI Art Weekly Premium 😉
And because it’s Black Friday, you can use the code BLACKFRIDAY to get 20% off. Same code also works for my Midjourney prompt collection on Promptcache.
Cheers and until next week ✌️
Unlock the full potential of AI-generated art with my curated collection of Midjourney SREF codes and prompts.
Cover Challenge 🎨
For the next cover I’m looking for error inspired submissions! Reward is a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
Highlights
Runway Frames
After Runway dropped a new video outpainting feature right after I hit “send” last week, they also announced a new image generation model “Frames” this Monday.
In a gist, Frames allows you to maintain stylistic consistency while generating aesthetic variations for your projects. They call this “Worlds” and it looks like an enhanced version of Midjourney style reference mechanic, but with a difference that this one will most likely seamlessly connect to Gen-3 video generation.
3D
Material Anything: Generating Materials for Any 3D Object via Diffusion
Material Anything can generate realistic materials for 3D objects, including those without textures. It adapts to different lighting and uses confidence masks to improve material quality, ensuring outputs are ready for UV mapping.
SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
SuperMat can quickly break down images of materials into three important maps: albedo, metallic, and roughness. It does this in about 3 seconds while keeping high quality, making it efficient for 3D object material estimation.
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
SelfSplat can create 3D models from multiple images without needing specific poses. It uses self-supervised methods for depth and pose estimation, resulting in high-quality appearance and geometry from real-world data.
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Make-It-Animatable can auto-rig any 3D humanoid model for animation in under one second. It generates high-quality blend weights and bones, and works with various 3D formats, ensuring accuracy even for non-standard skeletons.
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Reflect3D can detect 3D reflection symmetry from a single RGB image and improve 3D generation.
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
MVGenMaster can generate up to 100 new views from a single image using a multi-view diffusion model.
KinMo: Kinematic-aware Human Motion Understanding and Generation
KinMo can retrieve, generate, and edit human motion based on text descriptions. It breaks down motion into body joint movements, allowing for precise control over local body parts and improving the accuracy of text-motion retrieval.
TEXGen: a Generative Diffusion Model for Mesh Textures
TEXGen can generate high-resolution UV texture maps in texture space using a 700 million parameter diffusion model. It supports text-guided texture inpainting and sparse-view texture completion, making it versatile for creating textures for 3D assets.
PhysFlow: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
PhysFlow can simulate dynamic interactions in complex scenes. It identifies material types through image queries and enhances realism using video diffusion and a Material Point Method for detailed 4D representations.
Image
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting
DreamMix is a inpainting method based on the Fooocus model that can add objects from reference images and change their features using text.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Omegance can control detail levels in diffusion-based synthesis using a single parameter, ω. It allows for precise granularity control in generated outputs and enables specific adjustments through spatial masks and denoising schedules.
ID-Patch: Robust ID Association for Group Photo Personalization
ID-Patch can generate personalized group photos by matching faces with specific positions. It reduces problems like identity leakage and visual errors, achieving high accuracy and speed—seven times faster than other methods.
PersonaCraft: Personalized Full-Body Image Synthesis for Multiple Identities from Single References Using 3D-Model-Conditioned Diffusion
PersonaCraft can generate realistic full-body images of multiple people from a single reference image. It manages occlusions well and allows users to adjust body shapes for more personalized images.
OSDFace: One-Step Diffusion Model for Face Restoration
OSDFace can restore low-quality face images in one step, making it faster than traditional methods. It produces high-quality images while keeping the person’s identity consistent.
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Chat2SVG can generate and edit SVG vector graphics from text prompts. It combines Large Language Models and image diffusion models to create detailed SVG templates and allows users to refine them with simple language instructions.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Diptych Prompting can generate images of new subjects in specific contexts by treating text-to-image generation as an inpainting task.
Stable Flow: Vital Layers for Training-Free Image Editing
Stable Flow can edit images by adding, removing, or changing objects.
Video
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
CAT4D can create dynamic 4D scenes from single videos. It uses a multi-view video diffusion model to generate videos from different angles, allowing for strong 4D reconstruction and high-quality images.
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
MagicDriveDiT can generate high-resolution street scene videos for self-driving cars.
MyTimeMachine: Personalized Facial Age Transformation
MyTimeMachine can change faces to look older or younger using a global aging model. It needs just 50 selfies to keep the person’s identity, making it great for visual effects and realistic age transformations.
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
AnchorCrafter can generate high-quality 2D videos of people interacting with a reference product.
I2VControl: Disentangled and Unified Video Motion Synthesis Control
I2VControl can unify multiple motion control tasks when generating videos from images. It breaks videos into individual motion units with separate control signals, allowing for flexible combinations of control types to boost creativity in video generation.
Generative Omnimatte: Learning to Decompose Video into Layers
Generative Omnimatte can break down videos into meaningful layers, isolating objects, shadows, and reflections without needing static backgrounds. It uses a video diffusion model for high-quality results and can fill in hidden areas, enhancing video editing options.
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Sonic can generate high-quality portrait animations from audio input.
VIRES: Video Instance Repainting with Sketch and Text Guidance
VIRES can repaint, replace, generate, and remove objects in videos using sketches and text.
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
VideoRepair can improve text-to-video generation by finding and fixing small mismatches between text prompts and videos.
Audio
MultiFoley: Video-Guided Foley Sound Generation with Multimodal Controls
MultiFoley can generate high-quality sound effects for videos using text, audio, and video inputs. It allows users to create both realistic and whimsical sounds, like making a lion’s roar sound like a cat’s meow, and can complete partial soundtracks with full Foley audio.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying my Midjourney prompt collection on PROMPTCACHE 🚀
- Buying access to AI Art Weekly Premium 👑
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa