AI Toolbox | AI Art Weekly

UniTalker

UniTalker can create 3D face animations from speech input! It works better than other tools, making fewer mistakes in lip movements and performing well even with new data it hasn’t seen before.

19.07.24 · Project Page · Code · Audio-to-3D · Audio-to-Motion

Shape of Motion

Shape of Motion can reconstruct 3D scenes from a single video. The method is able to capture the full 3D motion of a scene and can handle occlusions and disocclusions.

18.07.24 · Project Page · Code · Video-to-4D · Video Scene Detection

MusiConGen

MusiConGen can generate music tracks with precise control over rhythm and chords. It allows users to define musical features through symbolic chord sequences, BPM, and text prompts.

17.07.24 · Project Page · Code · Text-to-Music

IMAGDressing-v1

IMAGDressing-v1 can generate human try-on images from input garments. It is able to control different scenes through text and can be combined with IP-Adapter and ControlNet pose to enhance the diversity and controllability of generated images.

17.07.24 · Project Page · Code · Image Editing · Image Inpainting · Personalized Image Generation

SparseCtrl

SparseCtrl is a image-to-video method with some cool new capabilities. With its RGB, depth and sketch encoder and one or few input images, it can animate images, interpolate between keyframes, extend videos as well as guide video generation with only depth maps or a few sketches. Especially in love with how scene transitions look like.

17.07.24 · Project Page · Code · Text-to-Video

Generating 3D House Wireframes with Semantics

3DWire can generate 3D house wireframes from text! The wireframes can be easily segmented into distinct components, such as walls, roofs, and rooms, reflecting the semantic essence of the shape.

17.07.24 · Project Page · Code · 3D Object Generation · 3D Scene Generation

An Object is Worth 64x64 Pixels

An Object is Worth 64x64 Pixels can generate 3D models from 64x64 pixel images! It creates realistic objects with good shapes and colors, working as well as more complex methods.

16.07.24 · Project Page · Code · Image-to-3D · 3D Object Generation

AccDiffusion

AccDiffusion can generate high-resolution images with fewer object repetition! Something Stable Diffusion has been plagued by since its infancy.

15.07.24 · Project Page · Code · Controllable Image Generation

Noise Calibration

Noise Calibration can improve video quality while keeping the original content structure. It uses a noise optimization strategy with pre-trained diffusion models to enhance visuals and ensure consistency between original and enhanced videos.

14.07.24 · Project Page · Code · Video Restoration · Video Editing

Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

ST-AVSR can enhance video resolution at any size while keeping details clear and smooth. It uses a pre-trained VGG network to improve quality and speed, making it better than other methods.

13.07.24 · Code · Video Restoration · Video Upscaling

Live2Diff

Live2Diff can translate live video streams using a special attention method in video diffusion models. It maintains smooth motion by linking each frame to previous ones and can achieve 16 frames per second on an RTX 4090 GPU, making it great for real-time use.

11.07.24 · Project Page · Code · Video Summarization

WildGaussians

WildGaussians is a new 3D Gaussian Splatting method that can handle occlusions and appearance changes. The method is able to achieve real-time rendering speeds and is able to handle in-the-wild data better than other methods.

11.07.24 · Project Page · Code · 3D Scene Generation · 3D Object Detection

Stable Audio Open

Stable Audio Open can generate up to 47 seconds of stereo audio at 44.1kHz from text prompts. It uses a transformer-based diffusion model for high-quality sound, making it useful for artists and researchers.

10.07.24 · Project Page · Code · Weights · Text-to-Audio

ColorPeel

ColorPeel can generate objects in images with specific colors and shapes.

09.07.24 · Project Page · Code · Text-to-Image

HumanRefiner

HumanRefiner can improve human hand and limb quality in images! The method is able to detect and correct issues related to both abnormal human poses.

09.07.24 · Code · Image Restoration · Image Editing

Tailor3D

Tailor3D can create customized 3D assets from text or single and dual-side images. The method also supports adding changes to the inputs through additional text prompts.

08.07.24 · Project Page · Code · Demo · Image-to-3D · Text-to-3D

GeneFace

GeneFace can generate high-quality 3D talking face videos from any speech audio. It solves the head-torso separation problem and provides better lip synchronization and image quality than earlier methods.

08.07.24 · Project Page · Code · Audio-to-3D · 3D Object Generation

Minutes to Seconds

Minutes to Seconds can efficiently fill in missing parts of images using a Denoising Diffusion Probabilistic Model (DDPM) that is about 60 times faster than other methods. It uses a Light-Weight Diffusion Model and smart sampling techniques to keep the image quality high.

08.07.24 · Code · Image Inpainting

PartCraft

PartCraft can generate customized and photorealistic virtual creatures by mixing visual parts from existing images. This tool allows users to create unique hybrids and make detailed changes, which is useful for digital asset creation and studying biodiversity.

05.07.24 · Code · Image Generation · Image Editing · Controllable Image Generation

LivePortrait

LivePortrait can animate a single source image with motion from a driving video. The method is able to generate high-quality videos at 60fps and is able to retarget the motion to other characters.

03.07.24 · Project Page · Code · Image-to-Video