AI Toolbox | AI Art Weekly

MeshFormer can generate high-quality 3D textured meshes from just a few 2D images in seconds.

23.02.24 · Project Page · Code · Text-to-3D · Image-to-3D

SpaRP

SPA-RP can create 3D textured meshes and estimate camera positions from one or a few 2D images. It uses 2D diffusion models to quickly understand 3D space, achieving high-quality results in about 20 seconds.

23.02.24 · Project Page · Code · 3D Object Generation

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

SCG can be used by musicians to compose and improvise new piano pieces. It allows musicians to guide music generation by using rules like following a simple I-V chord progression in C major. Pretty cool.

22.02.24 · Project Page · Code · Personalized Audio Generation

FlashTex

FlashTex](https://flashtex.github.io) can texture an input 3D mesh given a user-provided text prompt. These generated textures can also be relit properly in different lighting environments.

20.02.24 · Project Page · Code · Text-to-Texture · 3D Object Generation

Visual Style Prompting with Swapping Self-Attention

Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.

20.02.24 · Project Page · Code · Text-to-Image · Image Style Transfer

Vevo

Vevo can imitate voices without needing specific training data. It can change accents and emotions while keeping output high quality, using a self-supervised method that separates different speech features.

20.02.24 · Project Page · Code · Text-to-Speech · Controllable Audio Generation

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

Argus3D can generate 3D meshes from images and text prompts as well as unique textures for its generated shapes. Just imagine composing a 3D scene and fill it with objects by pointing at a space and using natural language to describe what you want to place there.

19.02.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

AudioEditing

AudioEditing are two methods for editing audio. The first technique allows for text-based editing, while the second is an approach for discovering semantically meaningful editing directions without supervision.

15.02.24 · Project Page · Code · Audio Editing

Magic-Me

Magic-Me can generate identity-specific videos from a few reference images while keeping the person’s features clear.

14.02.24 · Project Page · Code · Personalized Video Generation · Controllable Video Generation

Learning Continuous 3D Words for Text-to-Image Generation

Continuous 3D Words is a control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.

13.02.24 · Project Page · Code · Text-to-Image · Image Editing

GALA3D

GALA3D is a text-to-3D method that can generate complex scenes with multiple objects and control their placement and interaction. The method uses large language models to generate initial layout descriptions and then optimizes the 3D scene with conditioned diffusion to make it more realistic.

11.02.24 · Project Page · Code · Text-to-3D

LGM

LGM can generate high-resolution 3D models from text prompts or single-view images. It uses a fast multi-view Gaussian representation, producing models in under 5 seconds while maintaining high quality.

07.02.24 · Project Page · Code · Demo · Text-to-3D · Image-to-3D

ConsistI2V

ConsistI2V is an image-to-video method with enhanced visual consistency. Compared to other methods, this one is able to better maintain the subject, background, and style from the first frame, as well as ensure a fluid and logical progression while supporting long video generation as well as camera motion control.

06.02.24 · Project Page · Code · Image-to-Video · Video Editing

Direct-a-Video

Direct-a-Video can individually or jointly control camera movement and object motion in text-to-video generations. This means you can generate a video and tell the model to move the camera from left to right, zoom in or out and move objects around in the scene.

05.02.24 · Project Page · Code · Personalized Video Generation · Controllable Video Generation

Video-LaVIT

Video-LaVIT is a multi-modal video-language method that can comprehend and generate image and video content and supports long video generation.

05.02.24 · Project Page · Code · Text-to-Video · Video Captioning

Synthesizing Physically Plausible Human Motions in 3D Scenes

InterScene is a novel framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. Another step closer to completely dynamic game worlds and simulations. Checkout an impressive demo below.

04.02.24 · Project Page · Code · 3D Object Generation · Motion Generation

AToM

AToM is a text-to-mesh framework that can generate high-quality textured 3D meshes from text prompts in less than a second. The method is optimized across multiple prompts and is able to create diverse objects for which it wasn’t trained on.

01.02.24 · Project Page · Code · Text-to-3D

AnimateLCM

Last year we got real-time diffusion for images, this year we’ll get it for video! AnimateLCM can generate high-fidelity videos with minimal steps. The model also supports image-to-video as well as support for adapters like ControlNet. It’s not available yet, but once it hits, expect way more AI generated video content.

01.02.24 · Project Page · Code · Image-to-Video

Repositioning the Subject within Image

SEELE can move around objects within an image. It does so by removing it, inpainting occluded portions and harmonizing the appearance of the repositioned object with the surrounding areas.

30.01.24 · Project Page · Code · Image Inpainting · Image Editing

Motion-I2V

Motion-I2V can generate videos from images with clear and controlled motion. It uses a two-stage process with a motion field predictor and temporal attention, allowing for precise control over how things move and enabling video-to-video translation without needing extra training.

29.01.24 · Project Page · Code · Image-to-Video