AI Toolbox | AI Art Weekly

Dysen-VDM

While ZeroScope, Gen-2, PikaLabs and others have brought us high resolution text- and image-to-video, they all suffer from unsmooth video transition, crude video motion and action occurrence disorder. The new Dysen-VDM tries to tackle those issues, and while nowhere near perfect, delivers some promising results.

26.08.23 · Project Page · Code · Text-to-Video

Scenimefy

Scenimefy can turn real-world images and videos into high-quality anime scenes. It uses a smart method that keeps important details and produces better results than other tools.

24.08.23 · Project Page · Code · Image-to-Image

StableVideo

StableVideo is yet another vid2vid method. This one is not just a style transfer though, the method is able to differentiate between fore- and background when editing a video, making it possible to reimagine the subject within an entirely different landscape.

18.08.23 · Code · Video Editing

CoDeF

CoDeF can process videos consistently by using a canonical content field to gather static content and a temporal deformation field to track changes over time. This allows it to perform tasks like video-to-video translation and track moving objects, such as water and smog, without needing extra training.

15.08.23 · Project Page · Code · Video-to-Video Translation · Video Object Tracking

CLE Diffusion

CLE Diffusion can enhance low-light images by letting users control brightness levels and choose specific areas for improvement. It uses an illumination embedding and the Segment-Anything Model (SAM) for precise and natural-looking enhancements.

13.08.23 · Project Page · Code · Image Editing

IP-Adapter

Similar to ControlNet and Composer, IP-Adapter is a mutli-modal guidance adapter for image prompts which works with Stable Diffusion models trained on the same base model. The results look amazing.

13.08.23 · Project Page · Code · Text-to-Image

Semantics2Hands

Semantics2Hands can retarget realistic hand motions between different avatars while keeping the details of the movements. It uses an anatomy-based semantic matrix and a semantics reconstruction network to achieve high-quality hand motion transfer.

11.08.23 · Project Page · Code · Text-to-Motion

PlankAssembly

PlankAssembly can turn 2D line drawings from three views into 3D CAD models. It effectively handles noisy or incomplete inputs and improves accuracy using shape programs.

10.08.23 · Project Page · Code · Image-to-3D · 3D Object Generation

AudioLDM 2

AudioLDM 2 can generate high-quality audio in different forms, like text-to-audio and image-to-audio. It uses a smart training method to achieve top performance on important tests.

10.08.23 · Code · Demo · Text-to-Audio · Text-to-Music · Text-to-Speech

Separate Anything You Describe

AudioSep can separate audio events and musical instruments while enhancing speech using natural language queries. It performs well in open-domain audio source separation, significantly surpassing previous models.

09.08.23 · Project Page · Code · Demo · Audio Classification · Speech Recognition

3D Gaussian Splatting for Real-Time Radiance Field Rendering

3D Gaussian Splatting can create high-quality 3D scenes in real-time at 1080p resolution with over 30 frames per second. It uses 3D Gaussians for efficient scene representation and a fast rendering method, achieving competitive training times while maintaining great visual quality.

08.08.23 · Project Page · Code · 3D Scene Generation · 3D Object Detection

Make Explicit Calibration Implicit

RIP expensive low-light cameras? It’s amazing how AI is able to solve problems which so far was only possible with better hardware. In this example the novel LED model is able to denoise low-light images trained on only 6 pairs of images. The results are impressive, but the team is not done yet. They’re currently researching a method that works on a wide variety of scenarios trained on only 2 pairs.

07.08.23 · Project Page · Code · Image Restoration

LP-MusicCaps

LP-MusicCaps can generate high-quality music captions using large language models (LLMs).

31.07.23 · Code · Demo · Audio-to-Text

Effective Whole-body Pose Estimation with Two-stages Distillation

DWPose is a post estimator that uses a two-stage distillation approach to improve the accuracy of the pose estimation.

29.07.23 · Code · Image Object Detection

WavJourney

WavJourney is a system that uses large language models to generate audio content with storylines encompassing speech, music, and sound effects guided from text instructions. The demo results, while not perfect, sound great.

26.07.23 · Project Page · Code · Text-to-Audio

Interpolating between Images with Diffusion Models

Interpolating between Images with Diffusion Models can generate smooth transitions between two images using latent diffusion models. It allows for high-quality results across different styles and subjects while using CLIP to select the best images for interpolation.

24.07.23 · Project Page · Code · Image-to-Image · Image Editing

TokenFlow

TokenFlow is a new video-to-video method for temporal coherent video editing with text. We’ve seen a lot of them, but this one looks extremely good with almost no flickering and requires no fine-tuning whatsoever.

19.07.23 · Project Page · Code · Text-to-Video · Video Editing

FABRIC

FABRIC can condition diffusion models on feedback images to improve image quality. This method allows users to personalize content through multiple feedback rounds without needing training.

19.07.23 · Project Page · Code · Demo · Personalized Image Generation · Image Editing

AnimateDiff

AnimateDiff is a new framework that brings video generation to the Stable Diffusion pipeline. Meaning you can generate videos with any already existing Stable Diffusion models without having to fine-tune or train anything. Pretty amazing. @DigThatData put together a Google Colab notebook in case you want to give it a try.

10.07.23 · Project Page · Code · Demo · Text-to-Image · Image-to-Video

Text-Guided Synthesis of Eulerian Cinemagraphs

Text2Cinemagraph can create cinemagraphs from text descriptions, animating elements like flowing rivers and drifting clouds. It combines artistic images with realistic ones to accurately show motion, outperforming other methods in generating cinemagraphs for natural and artistic scenes.

06.07.23 · Project Page · Code · Text-to-Image · Image Editing