AI Art Weekly #67

Hello there, my fellow dreamers, and welcome to issue #67 of AI Art Weekly! 👋

Not much longer until the robots are going to take over. All the more reason to find a creative outlet! Let’s jump into this weeks issue:

  • VideoCrafter2 released!
  • UniVG - yet another video generation system
  • ROVI can inpaint videos with natural language instructions
  • InstantID a LoRA alternative that doesn’t require training
  • RoHM is motion tracking on steroids
  • MotionShop can replace people in videos with 3D avatars
  • STMC can generate 3D motion from text with multi-track timeline control
  • GARField an extract “objects” from NeRFs
  • TextureDreamer can transfer textures from images onto 3D meshes
  • Real3D-Portrait can generate 3D talking portraits
  • and more!

Cover Challenge 🎨


News & Papers

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

It’s been almost a year since the release of VideoCrafter1. VideoCrafter2 is the next iteration of it. With higher-quality generations, improved visual quality, motion, and concept composition. You can give it a try on HuggingFace.

VideoCrafter2 comparison with v1

UniVG

UniVG is yet another video generation system. The highlight of UniVG is its ability to use image inputs for guidance and modify and guide generation with additional text prompts. Haven’t seen other video models do this yet.

UniVG example

ROVI: Towards Language-Driven Video Inpainting via Multimodal Large Language Models

The interface of the future will be hands-free, so our AI assistants need a way to help us edit videos without us having to mark areas. ROVI uses natural language instructions to do just that. It helps you remove objects from videos or fill in missing parts by simply describing what you want to remove or what should be there instead. Results aren’t very good yet, but this is glimple into the future.

ROVI examples

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantID can generate customized images with various poses or styles from a single reference image. Results are comparable to LoRAs, only that InstantID doesn’t require any training! It supports stylized, realistic, non-portrait and novel-view, interpolation and even segmented multi-ID generations. Wild!

InstantID multi-ID segmentation example

RoHM: Robust Human Motion Reconstruction via Diffusion

Let’s talk 3D! RoHM can reconstruct complete, plausible 3D human motions from monocular videos with support for recognizing occluded joints! So, basically motion tracking on steroids but without the need for an expensive setup.

RoHM example

MotionShop: Replacing Characters in Videos with 3D Avatars

Speaking about motion tracking, MotionShop is a pipeline that can replace people in a video with pre-selected 3D avatars. The process consists of several steps including character detection, segmentation and tracking, inpainting, pose estimation, animation retargeting, light estimation, rendering, and compositing.

MotionShop example

STMC: Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Motion tracking is one thing, generating motion from text another. STMC is a method that can generate 3D human motion from text with multi-track timeline control. This means that instead of a single text prompt, users can specify a timeline of multiple prompts with defined durations and overlaps to create more complex and precise animations.

STMC example

GARField: Group Anything with Radiance Fields

In 3D NeRF scenes, objects aren’t typical 3D models, but are akin to “pixels” scattered in space, without clear connections to each other. GARField is a method that can discern and group these “pixels” in NeRF scenes, extracting them as individual assets.

GARField example

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

TextureDreamer is able to transfer photorealistic, high-fidelity, and geometry-aware textures from 3-5 images to arbitrary 3D meshes. The results look crazy good.

TextureDreamer examples

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Real3D-Portrait is a one-shot 3D talking portrait generation method. This one is able to generate realistic videos with natural torso movement and switchable backgrounds.

Real3D-Portrait example

Also interesting

  • MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
  • EgoGen: An Egocentric Synthetic Data Generator
  • FPDM: Fixed Point Diffusion Models
  • SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
  • Edit One for All: Interactive Batch Image Editing

Tools & Tutorials

These are some of the most interesting resources I’ve come across this week.

Visions” by me available on objkt

And that my fellow dreamers, concludes yet another AI Art weekly issue. Please consider supporting this newsletter by:

  • Sharing it 🙏❤️
  • Following me on Twitter: @dreamingtulpa
  • Buying me a coffee (I could seriously use it, putting these issues together takes me 8-12 hours every Friday 😅)
  • Buying a physical art print to hang onto your wall

Reply to this email if you have any feedback or ideas for this newsletter.

Thanks for reading and talk to you next week!

– dreamingtulpa

by @dreamingtulpa