AI Toolbox | AI Art Weekly

PEEKABOO

Speaking about video, more research is being conducted on motion control. Peekaboo allows to control the position, size and trajectory of an object very precisely through bounding boxes.

15.04.24 · Project Page · Code · Personalized Video Generation

in2IN

in2IN is a motion generation model that factors in both the overall interaction’s textual description and individual action descriptions of each person involved. This enhances motion diversity and enables better control over each person’s actions while preserving interaction coherence.

15.04.24 · Project Page · Code · Text-to-Motion · 3D Object Generation

Ctrl-Adapter

Ctrl-Adapter is a new framework that can be used to add diverse controls to any image or video diffusion model, enabling things like video control with sparse frames, multi-condition control, and video editing.

15.04.24 · Project Page · Code · Video Editing · Video Style Transfer

Video2Game

Video2Game can turn real-world videos into interactive game environments. It uses a neural radiance fields (NeRF) module for capturing scenes, a mesh module for faster rendering, and a physics module for realistic object interactions.

15.04.24 · Project Page · Code · Demo · Video-to-4D · 3D Scene Generation · 3D Object Generation

LoopGaussian

LoopGaussian can convert multi-view images of a stationary scene into authentic 3D cinemagraphs. The 3D cinemagraphs can be rendered from a novel viewpoint to obtain a natural seamless loopable video.

13.04.24 · Project Page · Code · 3D Scene Generation · 3D Object Generation

ControlNet++

[ControlNet++] can improve image generation by ensuring that generated images match the given controls, like segmentation masks and depth maps. It shows better performance than its predecessor, ControlNet, with improvements of 7.9% in mIoU, 13.4% in SSIM, and 7.6% in RMSE.

11.04.24 · Project Page · Code · Text-to-Image · Image-to-Image

Taming Stable Diffusion for Text to 360° Panorama Image Generation

PanFusion can generate 360-degree panorama images from a text prompt. The model is able to integrate additional constraints like room layout for customized panorama outputs.

11.04.24 · Project Page · Code · Text-to-Image

MindBridge

MindBridge can reconstruct images from fMRI brain signals using a single model that works for different people. It achieves high accuracy even with limited data, making it effective for new subjects.

11.04.24 · Project Page · Code · Brain-to-Image

GoodDrag

GoodDrag can improve the stability and image quality of drag editing with diffusion models. It reduces distortions by alternating between drag and denoising operations and introduces a new dataset, Drag100, for better quality assessment.

10.04.24 · Project Page · Code · Image Editing · Image Restoration

InstantMesh

InstantMesh can generate high-quality 3D meshes from a single image in under 10 seconds. It uses advanced methods like multiview diffusion and sparse-view reconstruction, and it significantly outperforms other tools in both quality and speed.

10.04.24 · Code · Image-to-3D

MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints

Speaking of reconstruction. Key2Mesh is yet another model that takes on 3D human mesh reconstruction, this time by utilizing 2D human pose keypoints as input instead of relying on visual data due to scarcity in image datasets with 3D labels.

10.04.24 · Project Page · Code · Image-to-3D

Sparse Global Matching for Video Frame Interpolation with Large Motion

Sparse Global Matching for Video Frame Interpolation with Large Motion can handle large motion in video frame interpolation by using a sparse global matching approach.

10.04.24 · Project Page · Code · Video Inpainting

Reconstructing Hand-Held Objects in 3D

[MCC-Hand-Object (MCC-HO)] can reconstruct 3D shapes of hand-held objects from a single RGB image and a 3D hand model. It uses Retrieval-Augmented Reconstruction (RAR) with GPT-4(V) to match 3D models to the object’s shape, achieving top performance on various datasets.

09.04.24 · Project Page · Code · 3D Object Generation · 3D Object Detection

ZeST

ZeST can change the material of an object in an image to match a material example image. It can also perform multiple material edits in a single image and perform implicit lighting-aware edits on the rendering of a textured mesh.

09.04.24 · Project Page · Code · Image Inpainting · Image Editing

Automatic Controllable Colorization via Imagination

Imagine Colorization leverages pre-trained diffusion models to colorize images while supporting controllable and user-interactive capabilities.

08.04.24 · Project Page · Code · Image Colorization · Image Editing

SpatialTracker

SpatialTracker can track 2D pixels in 3D space, even when objects are blocked or rotated. It uses depth estimators and a triplane representation to achieve top performance in difficult situations.

05.04.24 · Project Page · Code · Motion Capture

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

MuDI can generate high-quality images of multiple subjects without mixing their identities. It has a 2x higher success rate for personalizing images and is preferred by over 70% of users in evaluations.

05.04.24 · Project Page · Code · Text-to-Image

Physical Property Understanding from Language-Embedded Feature Fields

NeRF2Physics can predict the physical properties (mass, friction, hardness, thermal conductivity and Young’s modulus) of objects from a collection of images. This makes it possible to simulate the physical behavior of digital twins in a 3D scene.

05.04.24 · Project Page · Code · Image Object Detection

InstructHumans

InstructHumans can edit existing 3D human textures using text prompts. It maintains avatar consistency pretty well and enables easy animation.

05.04.24 · Project Page · Code · 3D Editing · 3D Avatar Generation · 3D Texture Generation

LCM-Lookahead for Encoder-based Text-to-Image Personalization

LCM-Lookahead is another attempted LoRA killer with an LCM-based approach for identity transfer in text-to-image generations.

04.04.24 · Project Page · Code · Text-to-Image Personalized Image Generation