AI Art Weekly #67
Hello there, my fellow dreamers, and welcome to issue #67 of AI Art Weekly! 👋
Not much longer until the robots are going to take over. All the more reason to find a creative outlet! Let’s jump into this weeks issue:
- VideoCrafter2 released!
- UniVG - yet another video generation system
- ROVI can inpaint videos with natural language instructions
- InstantID a LoRA alternative that doesn’t require training
- RoHM is motion tracking on steroids
- MotionShop can replace people in videos with 3D avatars
- STMC can generate 3D motion from text with multi-track timeline control
- GARField an extract “objects” from NeRFs
- TextureDreamer can transfer textures from images onto 3D meshes
- Real3D-Portrait can generate 3D talking portraits
- and more!
Want me to keep up with AI for you? Well, that requires a lot of coffee. If you like what I do, please consider buying me a cup so I can stay awake and keep doing what I do 🙏
Cover Challenge 🎨
The “AI braces” challenge has been a full success. From a pool of 103 innovative submissions, a curated selection of 35 outstanding finalists has emerged which has been put together as a group drop on OBJKT. Not all have been minted yet, but some already sold even before the announcement. Grab one while they are still available. Thank you to all the beautiful artists involved for making this possible 🙌
The next challenge is a classic cover challenge again. I’m looking for submissions inspired by Jung’s archtypes anima & animus. Reward is again $50 and a rare role in our Discord community which lets you vote in the finals. Rulebook can be found here and images can be submitted here.
News & Papers
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
It’s been almost a year since the release of VideoCrafter1. VideoCrafter2 is the next iteration of it. With higher-quality generations, improved visual quality, motion, and concept composition. You can give it a try on HuggingFace.
UniVG
UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.
ROVI: Towards Language-Driven Video Inpainting via Multimodal Large Language Models
The interface of the future will be hands-free, so our AI assistants need a way to help us edit videos without us having to mark areas. ROVI uses natural language instructions to do just that. It helps you remove objects from videos or fill in missing parts by simply describing what you want to remove or what should be there instead. Results aren’t very good yet, but this is glimple into the future.
InstantID: Zero-shot Identity-Preserving Generation in Seconds
InstantID can generate customized images with various poses or styles from a single reference image. Results are comparable to LoRAs, only that InstantID doesn’t require any training! It supports stylized, realistic, non-portrait and novel-view, interpolation and even segmented multi-ID generations. Wild!
RoHM: Robust Human Motion Reconstruction via Diffusion
Let’s talk 3D! RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.
MotionShop: Replacing Characters in Videos with 3D Avatars
Speaking about motion tracking, MotionShop is a pipeline that can replace people in a video with pre-selected 3D avatars. The process consists of several steps including character detection, segmentation and tracking, inpainting, pose estimation, animation retargeting, light estimation, rendering, and compositing.
STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation
Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.
GARField: Group Anything with Radiance Fields
In 3D NeRF scenes, objects aren’t typical 3D models, but are akin to “pixels” scattered in space, without clear connections to each other. GARField is a method that can discern and group these “pixels” in NeRF scenes, extracting them as individual assets.
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
TextureDreamer is able to transfer photorealistic, high-fidelity, and geometry-aware textures from 3-5 images to arbitrary 3D meshes. The results look crazy good.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.
Also interesting
- MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
- EgoGen: An Egocentric Synthetic Data Generator
- FPDM: Fixed Point Diffusion Models
- SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
- Edit One for All: Interactive Batch Image Editing
@__JAAF__ “misused” the --tile
feature in Midjourney v6 to create seamless high-quality 3D scenes that can be explored in VR.
@ButchersBrain created a trailer with Stable Diffusion Video for a non-existing movie called “CLARK”.
@lifeofc created his first cinematic AI music video. Recorded on an iPhone in his son’s closet and transformed with RunwayML’s Gen-1 tool. Pretty cool.
With FLOWERS, @NewMediaPioneer imagines new lifeforms and explores the intersection of boundaries of impossible beings that blur the lines between the vegetal and animal kingdoms.
@Oranguerillatan created a beautiful and weird short music video clip using ComfyUI, ChatGPT and Suno AI.
Tools & Tutorials
These are some of the most interesting resources I’ve come across this week.
Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline
A HuggingFace demo for the MotionCtrl (issue 62) Stable Video Diffusion implementation.
@camenduru put together a Google Colab notebook for Meta’s new music/audio model MAGNeT (issue 66).
While Animate Anyone (issue 61) still hasn’t seen the light of day, developer MooreThreads released an unofficial implementation of the paper.
And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:
- Sharing it 🙏❤️
- Following me on Twitter: @dreamingtulpa
- Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
- Buying a physical art print to hang onto your wall
Reply to this email if you have any feedback or ideas for this newsletter.
Thanks for reading and talk to you next week!
– dreamingtulpa