AI Art Weekly #90
Hello there, my fellow dreamers, and welcome to issue #90 of AI Art Weekly! π
Looks like AI research isnβt slowing down anytime soon. I skimmed through 244 papers for you this week and picked the 19 most interesting ones.
With that much research, it gets hard keeping track of everything, so Iβm working on a searchable index of all past papers with the option to filter by category and code availability. Stay tuned for that!
In this issue:
- 3D: WildGaussians, Tailor3D, MeshAvatar, 3D Gaussian Ray Tracing, RodinHD, PICA
- Motion: Infinite Motion, CrowdMoGen
- 4D: 4DiM, Segment Any 4D Gaussians
- Image: AuraFlow v0.1, ColorPeel, HumanRefiner, Minutes to Seconds, PartCraft, Still-Moving
- Video: Live2Diff, GIMM
- Audio: ReWaS, MuseBarControl
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For the next cover Iβm looking for 404 submissions and art to display on our 404 page! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D
WildGaussians: 3D Gaussian Splatting in the Wild
WildGaussians is a new 3D Gaussian Splatting method that can handle occlusions and appearance changes. The method is able to achieve real-time rendering speeds and is able to handle in-the-wild data better than other methods.
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images
Tailor3D can create customized 3D assets from text or single and dual-side images. The method also supports adding changes to the inputs through additional text prompts.
MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
MeshAvatar can generate high-quality triangular human avatars from multi-view videos. The avatars can be edited, manipulated, and relit.
3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes
3D Gaussian Ray Tracing brings ray tracing support to 3D Gaussian Splats. The method is able to handle large numbers of semi-transparent particles and is well-suited for rendering from highly-distorted cameras, making it a great fit for robotics.
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
RodinHD can generate high-fidelity 3D avatars from a portrait image. The method is able to capture intricate details such as hairstyles and can generalize to in-the-wild portrait input.
PICA: Physics-Integrated Clothed Avatar
PICA can generate high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing, from multi-view videos.
Motion
Infinite Motion: Extended Motion Generation via Long Text Instructions
Infinite Motion can generate long-duration motion from arbitrary lengths of text! The model also supports precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis.
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
CrowdMoGen can generate crowd motions based on a text prompt! The model is able to efficiently synthesize the required collective motions based on the holistic plans and can handle a wide range of scenarios and crowd sizes.
4D
4DiM: Controlling Space and Time with Diffusion Models
Google DeepMind has been researching 4DiM, a cascaded diffusion model for 4D novel view synthesis. It can generate 3D scenes with temporal dynamics from a single image and a set of camera poses and timestamps.
Segment Any 4D Gaussians
SA4D is a framework that can segment anything in the 4D digital world based on 4D Gaussians. The method is able to remove, recolor, compose, and render high-quality masks of objects within seconds.
Image
AuraFlow v0.1
fal.ai has released AuraFlow v0.1 this week, the first release of a new text-to-image open-source foundation model series.
With Stability AI becoming unstable, it has been a while since weβve seen a new open source text-to-image model, so this is great news for the open-source community.
You can test the model directly on the fal.ai playground. The weights can be found on HuggingFace.
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
ColorPeel can generate objects in images with specific colors and shapes.
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
HumanRefiner can improve human hand and limb quality in images! The method is able to detect and correct issues related to both abnormal human poses.
Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling
M2S is a new DDPM-based image inpainting method that is 60 times faster than RePaint! π₯
PartCraft: Crafting Creative Objects by Parts
PartCraft can generate objects by parts! Perfect for crafting new types of animal, robot and human hybrids π
Still-Moving: Customized Video Generation without Customized Video Data
Still-Moving can customize video models with the spatial prior of a customized text-to-image model and a motion prior of a text-to-video model. This enables personalized, stylized, and conditional video generation.
Video
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
Live2Diff is the first attempt that enables uni-directional attention modeling to video diffusion models for live video steam processing. It achieves 16FPS on RTX 4090 GPU π₯
Generalizable Implicit Motion Modeling for Video Frame Interpolation
GIMM is a new video interpolation method that uses motion modelling to predict motion between frames.
Audio
Read, Watch and Scream! Sound Generation from Text and Video
ReWaS can generate sound effects from text and video. The method is able to estimate the structural information of audio from the video while receiving key content cues from a user prompt.
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss
MuseBarControl enables fine-grained control over individual bars in symbolic music generation. This makes it possible to specify the number of notes, pitch range, and harmony for each bar, as well as the overall structure of the composition.
Also interesting
@bennash has created a new website that generates a prompt for an image, similar to Midjourney /describe
or Clip Interrogator.
LivePortrait also works with video. @toyxyz3 ran some tests with it.
Made by @Arata_Fukoe with ChatGPT, Suno AI, DreamMachine, Gen-3, Kling, Midjourney and Stable Diffusion.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa