AI Toolbox | AI Art Weekly

MVOC

MVOC is a training-free multiple video object composition method with diffusion models. The method can be used to composite multiple video objects into a single video while maintaining motion and identity consistency.

22.06.24 · Project Page · Code · Image-to-Video · Video Editing

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

Conditional Image Leakage can be used to generate videos with more dynamic and natural motion from image prompts.

22.06.24 · Project Page · Code · Image-to-Video

Image Conductor

Image Conductor can generate video assets from a single image with precise control over camera transitions and object movements.

21.06.24 · Project Page · Code · Image-to-Video

Mora

Mora can enable generalist video generation through a multi-agent framework. It supports text-to-video generation, video editing, and digital world simulation, achieving performance similar to the Sora model.

21.06.24 · Code · Text-to-Video · Image-to-Video · Video Editing

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

iCD can be used for zero-shot text-guided image editing with diffusion models. The method is able to encode real images into their latent space in only 3-4 inference steps and can then be used to edit the image with a text prompt.

20.06.24 · Project Page · Code · Text-to-Image · Image Editing

EvTexture

EvTexture is a video super-resolution upscaling method that utilizes event signals for texture enhancement for more accurate texture and high-resolution detail recovery.

19.06.24 · Project Page · Code · Video Restoration · Video Upscaling

Sketch2Scene

Sketch2Scene can create interactive 3D game scenes from simple sketches and text descriptions. It uses a diffusion model with ControlNet and procedural generation to make high-quality, playable 3D environments that match what users want.

18.06.24 · Project Page · Code · Image-to-3D

Make It Count

Make It Count can generate images with the exact number of objects specified in the prompt while keeping a natural layout. It uses the diffusion model to accurately count and separate objects during the image creation process.

14.06.24 · Project Page · Code · Text-to-Image

Glyph-ByT5-v2

Glyph-ByT5-v2 is a new SDXL model that can generate high-quality visual layouts with text in 10 different languages.

14.06.24 · Project Page · Code · Image-to-Text · Image Editing

MeshAnything

MeshAnything can convert 3D assets in any 3D representation into meshes. This can be used to enhance various 3D asset production methods and significantly improve storage, rendering, and simulation efficiencies.

14.06.24 · Project Page · Code · 3D Object Generation · 3D Mesh Generation

GradeADreamer

GradeADreamer is yet another text-to-3D method. This one is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU.

14.06.24 · Code · Text-to-3D

HairFastGAN

HairFastGAN can transfer hairstyles from one image to another in near real-time. It handles different poses and colors well, achieving high quality in under a second on an Nvidia V100.

11.06.24 · Code · Image-to-Image · Image Editing

MM-Diffusion

MM-Diffusion can generate high-quality audio-video pairs using a multi-modal diffusion model with two coupled denoising autoencoders.

05.06.24 · Code · Audio-to-Video · Video-to-Audio

Improved Distribution Matching Distillation for Fast Image Synthesis

DMD2 is a new improved distillation method that can turn diffusion models into efficient one-step image generators.

23.05.24 · Project Page · Code · Controllable Image Generation · Personalized Image Generation

EditWorld

EditWorld can simulate world dynamics and edit images based on instructions that are grounded in various world scenarios. The method is able to add, replace, delete, and move objects in images, as well as change their attributes and perform other operations.

23.05.24 · Code · Image Editing

RectifID

RectifID is yet another personalization method from user-provided reference images of human faces, live subjects, and certain objects for diffusion models.

23.05.24 · Code · Personalized Image Generation · Image Editing

MagicPose4D

MagicPose4D can generate 3D objects from text or images and transfer precise motions and trajectories from objects and characters in a video or mesh sequence.

22.05.24 · Project Page · Code · 3D Object Generation · Text-to-Motion

ReVideo

ReVideo can change video content in specific areas while keeping the motion intact. It allows users to customize motion paths and uses a three-stage training method for precise video editing.

22.05.24 · Project Page · Code · Video Editing

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Face Adapter is a new face swapping method that can generate facial detail and handle face shape changes with fine-grained control over attributes like identity, pose, and expression.

21.05.24 · Project Page · Code · Image Editing

RemoCap

RemoCap can reconstruct 3D human bodies from motion sequences. It’s able to capture occluded body parts with greater fidelity, resulting in less model penetration and distorted motion.

21.05.24 · Project Page · Code · 3D Object Generation