Hello there, my fellow dreamers, and welcome to issue #67 of AI Art Weekly! 👋
Not much longer until the robots are going to take over. All the more reason to find a creative outlet! Let’s jump into this weeks issue:
- VideoCrafter2 released!
- UniVG - yet another video generation system
- ROVI can inpaint videos with natural language instructions
- InstantID a LoRA alternative that doesn’t require training
- RoHM is motion tracking on steroids
- MotionShop can replace people in videos with 3D avatars
- STMC can generate 3D motion from text with multi-track timeline control
- GARField an extract “objects” from NeRFs
- TextureDreamer can transfer textures from images onto 3D meshes
- Real3D-Portrait can generate 3D talking portraits
- and more!
Cover Challenge 🎨
News & Papers
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
It’s been almost a year since the release of VideoCrafter1. VideoCrafter2 is the next iteration of it. With higher-quality generations, improved visual quality, motion, and concept composition. You can give it a try on HuggingFace.
UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.
ROVI: Towards Language-Driven Video Inpainting via Multimodal Large Language Models
The interface of the future will be hands-free, so our AI assistants need a way to help us edit videos without us having to mark areas. ROVI uses natural language instructions to do just that. It helps you remove objects from videos or fill in missing parts by simply describing what you want to remove or what should be there instead. Results aren’t very good yet, but this is glimple into the future.
InstantID: Zero-shot Identity-Preserving Generation in Seconds
InstantID can generate customized images with various poses or styles from a single reference image. Results are comparable to LoRAs, only that InstantID doesn’t require any training! It supports stylized, realistic, non-portrait and novel-view, interpolation and even segmented multi-ID generations. Wild!
RoHM: Robust Human Motion Reconstruction via Diffusion
Let’s talk 3D! RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.
MotionShop: Replacing Characters in Videos with 3D Avatars
Speaking about motion tracking, MotionShop is a pipeline that can replace people in a video with pre-selected 3D avatars. The process consists of several steps including character detection, segmentation and tracking, inpainting, pose estimation, animation retargeting, light estimation, rendering, and compositing.
STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
GARField: Group Anything with Radiance Fields
In 3D NeRF scenes, objects aren’t typical 3D models, but are akin to “pixels” scattered in space, without clear connections to each other. GARField is a method that can discern and group these “pixels” in NeRF scenes, extracting them as individual assets.
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
TextureDreamer is able to transfer photorealistic, high-fidelity, and geometry-aware textures from 3-5 images to arbitrary 3D meshes. The results look crazy good.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.
- MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
- EgoGen: An Egocentric Synthetic Data Generator
- FPDM: Fixed Point Diffusion Models
- SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
- Edit One for All: Interactive Batch Image Editing
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!