AI Art Weekly #84
Hello there, my fellow dreamers, and welcome to issue #84 of AI Art Weekly! 👋
After scouring through 110+ papers through you this week, I bring you another chock-full issue of the latest and greatest AI news this week.
But before we jump in, a little heads up that there will be no issue next week as I’m going to get married 🤵♂️👰♀️. I’ll be back with a new issue on June 7th.
In this issue:
- 3D: MirrorGaussian, GarmentDreamer, NOVA-3D
- Motion: RemoCap, MagicPose4D, Semantic Gesticulator, CondMDI, SignLLM, SynCHMR
- Images: Face Adapter, Images that Sound, DMD2, InstaDrag, EditWorld, RectifID
- Video: FIFO-Diffusion, ReVideo, Slicedit, ViViD, MotionCraft, Generative Camera Dolly, MOFT
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
For the next cover I’m looking for love submissions! Reward is again $100 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
MirrorGaussian is the first 3D Gaussian Splatting method that can reconstruct mirrors in real-time. It’s even able to add new mirrors and objects to existing scenes.
GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details
GarmentDreamer can generate wearable, simulation-ready 3D garment meshes from text prompts. The method is able to generate diverse geometric and texture details, making it possible to create a wide range of different clothing items.
NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction
NOVA-3D can generate 3D anime characters from non-overlapped front and back views.
Motion
RemoCap: Disentangled Representation Learning for Motion Capture
RemoCap can reconstruct 3D human bodies from motion sequences. It’s able to capture occluded body parts with greater fidelity, resulting in less model penetration and distorted motion.
MagicPose4D: Crafting Articulated Models with Appearance and Motion Control
MagicPose4D can generate 3D objects from text or images and transfer precise motions and trajectories from objects and characters in a video or mesh sequence.
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
Semantic Gesticulator can generate realistic gestures accompanying speech with strong semantic correspondence vital for effective communication.
CondMDI: Flexible Motion In-betweening with Diffusion Models
CondMDI can generate precise and diverse motions that conform to flexible user-specified spatial constraints and text descriptions. This enables the creation of high-quality animations from just text prompts and inpainting between keyframes.
SignLLM: Sign Languages Production Large Language Models
SignLLM is the first multilingual Sign Language Production (SLP) model. It can generate sign language gestures from input text or prompts and achieve state-of-the-art performance on SLP tasks across eight sign languages.
SynCHMR: Synergistic Global-space Camera and Human Reconstruction from Videos
SynCHMR can reconstruct camera trajectory, human motion, and scene in one global coordinate from videos.
Images
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Face Adapter is a new face swapping method that can generate facial detail and handle face shape changes with fine-grained control over attributes like identity, pose, and expression.
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound can generate spectrograms that look like images but can also be played as sound.
DMD2: Improved Distribution Matching Distillation for Fast Image Synthesis
DMD2 is a new improved distillation method that can turn diffusion models into efficient one-step image generators.
InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
InstaDrag can do drag-based image editing in just one second. The method is trained on videos and is able to perform local shape deformations not presented in the training data, like lengthening hair or twisting rainbows.
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
EditWorld can simulate world dynamics and edit images based on instructions that are grounded in various world scenarios. The method is able to add, replace, delete, and move objects in images, as well as change their attributes and perform other operations.
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
RectifID is yet another personalization method from user-provided reference images of human faces, live subjects, and certain objects for diffusion models.
Video
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Infinitely long AI videos are coming! FIFO-Diffusion is a new inference method for existing text-to-video models like VideoCrafter2, Open-Sora-Plan and ZeroScope which makes it possible to generate infinitely long videos without any additional training!
ReVideo: Remake a Video with Motion and Content Control
Video editing is getting wild! ReVideo can change the content of a specific area while keeping the motion constant, customize new motion trajectories, or modify both content and motion trajectories.
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Slicedit can edit videos with a simple text prompt that retains the structure and motion of the original video while adhering to the target text.
ViViD: Video Virtual Try-on using Diffusion Models
ViViD can transfer a clothing item onto the video of a target person. The method is able to capture garment details and human posture, resulting in more coherent and lifelike videos.
MotionCraft: Zero-Shot Video Generation
MotionCraft can animate single images based on phsyics, resulting in videos that evolves more coherently from the first frame. The method is able to simulate different physics, such as fluid dynamics, rigid motion, and multi-agent systems, and can also be combined with animation software to generate the required optical flows.
Generative Camera Dolly
Generative Camera Dolly can regenerate a video from any chosen perspective. Still very early, but imagine being able to change any shot or angle in a video after it’s been recorded!
MOFT: MOtion FeaTure
MOFT is a training-free video motion interpreter and controller. It can be used to extract motion information from video diffusion models and guide the motion of generated videos without the need for retraining.
Also interesting
- MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
- DoGaussian: Distributed-Oriented Gaussian Splatting
- MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video
@ClaireSilver12 is hosting her 7th AI contest. The theme is “Back to School”. Deadline: 6/4/24. I don’t have time to compete, but I encourage you to chime in!
Probably one of the coolest and most accurate AI songs I’ve heard so far.
@ErwannMillon used IP Adapter, ControlNet and AnimateDiff to turn Blue Sail by Hans Haacke into an ocean.
@ALotkov86199 made this cool AI video using AnimateDiff and some Stable Diffusion inpainting.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa