AI Art Weekly #77
Hello there, my fellow dreamers, and welcome to issue #77 of AI Art Weekly! π
Another wild week in AI world is behind us. Grok 1.5 is on the horizon, more fluid deep fakes are coming, the robots are evolving and passed away loved ones have been brought back to life.
On top of that, Iβve gone through another 150+ papers for you this week. Happy Easter to those who celebrate it π°π₯
In this issue:
- 3D generation: ThemeStation, DreamPolisher, TC4D, MonoHair, GaussianCube
- Texture generation: Garment3DGen, Make-It-Vivid
- Human Pose & Motion: TRAM, AiOS
- Image generation and editing: FlashFace, PAID, NeuroPictor, ObjectDrop, Inclusion Matching, Attribute Control
- Video generation and editing: Champ, TRIP, AniPortrait, StreamingT2V, Spectral Motion Alignment
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do π
Cover Challenge π¨
For next weeks cover Iβm looking for submissions based on four reference characters! Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
3D generation
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Want a diverse set of trees, cars, or chairs for your 3D environment? ThemeStation can generate multiple variations from one or more similar 3D reference assets. The method also allows for editing 3D assets using text prompts.
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
DreamPolisher is yet another text-to-3D method. This one uses Gaussian Splats and ControlNet to generate high-quality and view-consistent 3D objects from text only.
TC4D: Trajectory-Conditioned Text-to-4D Generation
TC4D can animate 3D scenes generated from text along arbitrary trajectories. I can see this being useful for generating 3D effects for movies or games.
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
MonoHair can reconstruct high-fidelity 3D hair from a single video. Very impressive results and the method is able to handle a wide range of hair types and styles.
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
GaussianCube is a image-to-3D model that is able to generate high-quality 3D objects from multi-view images. This one also uses 3D Gaussian Splatting, converts the unstructured representation into a structured voxel grid, and then trains a 3D diffusion model to generate new objects.
Texture generation
Garment3DGen: 3D Garment Stylization and Texture Generation
Garment3DGen can stylize the geometry and textures from 2D image and 3D mesh garments! These can be fitted on top of parametric bodies and simulated. Could be used for hand-garment interaction in VR or to turn sketches into 3D garments.
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Make-It-Vivid generates high-quality texture maps for 3D biped cartoon characters from text instructions, making it possible to dress and animate characters based on prompts.
Human Pose & Motion
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Ditch the expensive motion capture suits and cameras, TRAM can reconstruct one or multiple humans from monocular videos.
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
TRAM isnβt the only motion capture method this week though. AiOS is another one that can estimate the human body, hand, and facial expressions in a single step, resulting in more accurate and complete 3D reconstructions of people.
Image generation and editing
FlashFace
Another LoRA contender for identity preservation enters the game: FlashFace. Based on one or a few reference face images and a text prompt, the method can change the age or gender of a person, turn virtual characters into real people, make real people into artworks, and swap faces while retaining facial details to a high degree.
PAID: (Prompt-guided) Attention Interpolation of Text-to-Image
PAID is a method that enables smooth high consistency image interpolation for diffusion models. GANs have been the king in that field so far, but this method shows promising results for diffusion models.
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Itβs been a while since we heard anything about π§ -to-image. NeuroPictor is a new method for fMRI-to-image reconstruction. With Neuralink on the horizon, might not be long now until we can visualize our dreams π€―
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Googleβs ObjectDrop enables photorealistic object removal and insertion while considering their effects on the scene. Buuut, itβs from Google. So, you know, probably will never get into the hands of us normies :(
Inclusion Matching for Animation Paint Bucket Colorization
Paint bucket colorization just got so much easier. Inclusion Matching can colorize line art in animations automatically. The technique requires painters to colorize just one frame, after which the algorithm autonomously propagates the color to subsequent frames.
Attribute Control: Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Attribute Control enables fine-grained control over attributes of specific subjects in text-to-image models. This lets you modify attributes like age, width, makeup, smile and more for each subject independently.
Video generation and editing
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Itβs been a while since the Animate Anyone drama. Champ is the next iteration of the idea of generating videos of anyone with a single image and a bit of motion guidance.
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
TRIP is a new approach to image-to-video generation with better temporal coherence.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
AniPortrait can generate high-quality portrait animations driven by audio and a reference portrait image. It also supports face reenactment from a reference video.
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
StreamingT2V enables long text-to-video generations featuring rich motion dynamics without any stagnation. It ensures temporal consistency throughout the video, aligns closely with the descriptive text, and maintains high frame-level image quality. Videos can be up to 1200 frames, spanning 2 minutes, and can be extended for even longer durations.
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
Spectral Motion Alignment is a framework that can capture complex and long-range motion patterns within videos and transfer them to video-to-video frameworks like MotionDirector, VMC, Tune-A-Video, and ControlVideo.
Also interesting
- Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians
- EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
- InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
- Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
- latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
- DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
OpenAI released 7 new Sora experiments this week. A little glimpse of what Sora looks like in the hands of artists.
@ArgletonLane has been exploring using AI to peel back layers of reality on anonymized Google Street View captures revealing the surprising identities of the Argletonians. Such a cool idea.
@MaxEinhorn did a pretty cool ViggleAI mixing original footage with a made up character. Worfklow included.
Another cool example by @blizaine on how Generative AI can bring 3D objects into augmented reality.
AdaSR-TalkingHead is another method for talking-head video generation. This one can generate videos from one source image and a driving motion video.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it πβ€οΈ
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday π )
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
β dreamingtulpa