AI Toolbox | AI Art Weekly

Collaborative Score Distillation for Consistent Visual Synthesis

CSD-Edit is a multi modality editing approach that compared to other methods works great on images bigger than the traditional 512x512 limitation and can edit 4k or large panorama images, has improved temporal consistency on video frames as well as improved view consistency when editing or generating 3D scenes.

04.07.23 · Project Page · Code · Image Editing · Image-to-Image

SketchMetaFace

Similar like ControlNet scribble for images, SketchMetaFace brings sketch guidance to the 3D realm and makes it possible to turn a sketch into a 3D face model. Pretty excited about progress like this, as this will bring controllability to 3D generations and make generating 3D content way more accessible.

03.07.23 · Project Page · Code · Image-to-3D · 3D Object Generation

NIS-SLAM

NIS-SLAM can reconstruct high-fidelity surfaces and geometry from RGB-D frames. It also learns 3D consistent semantic representations during this process.

30.06.23 · Project Page · Code · 3D Scene Generation · 3D Object Generation

DreamDiffusion

DreamDiffusion can generate high-quality images from brain EEG signals without needing to translate thoughts into text. It uses pre-trained text-to-image models and special techniques to handle noise and individual differences, making it a key step towards affordable thoughts-to-image technology.

29.06.23 · Project Page · Code · Brain-to-Image

MotionGPT

MotionGPT can generate, caption, and predict human motion by treating it like a language. It achieves top performance in these tasks, making it useful for various motion-related applications.

26.06.23 · Project Page · Code · Text-to-Motion · Motion Generation

DiffSketcher

DiffSketcher is a tool that can turn words into vectorized free-hand sketches. The method also supports the ability to define the level of abstraction, allowing for more abstract or concrete generations.

26.06.23 · Project Page · Code · Text-to-Image

Diffusion with Forward Models

Diffusion with Forward Models is a able to reconstruct 3D scenes from a single input image. Additionally it’s also able to add small and short motions to images with people in them.

20.06.23 · Project Page · Code · Image-to-3D · 3D Scene Generation

Seeing the World through Your Eyes

It’s said that our eyes hold the universe. When it comes to the method discussed in the paper Seeing the World through Your Eyes, they at least hold a 3D scene. The method discussed in the paper is able to reconstruct 3D scenes beyond the camera’s line-of-sight using portrait images containing eye reflections.

15.06.23 · Project Page · Code · 3D Scene Generation · 3D Object Detection

ControlVideo

We’ve already seen a few attempts at bringing ControlNet to video, but getting temporal coherency right seems to be a trick issue to solve. ControlVideo is the next attempt and things start to look extremely promising.

11.06.23 · Project Page · Code · Text-to-Video · Video Editing

Neuralangelo

Neuralangelo can reconstruct detailed 3D surfaces from RGB video captures. It uses multi-resolution 3D hash grids and neural surface rendering, achieving high fidelity without needing extra depth inputs.

05.06.23 · Project Page · Code · 3D Object Generation · 3D Scene Generation

VideoComposer

VideoComposer can generate videos with control over how they look and move using text, sketches, and motion vectors. It improves video quality by ensuring frames match well, allowing for flexible video creation and editing.

03.06.23 · Project Page · Code · Text-to-Video · Video Editing · Personalized Video Generation

Cocktail

Cocktail is a pipeline for guiding image generating. Compared to ControlNet, it only requires one generalized model for multiple modalities like Edge, Pose and Mask guidance.

01.06.23 · Project Page · Code · Text-to-Image

Make-Your-Video

Make-Your-Video can generate customized videos from text and depth information for better control over content. It uses a Latent Diffusion Model to improve video quality and reduce the need for computing power.

01.06.23 · Project Page · Code · Text-to-Video

Example-based Motion Synthesis via Generative Motion Matching

Now motion capturing is cool. But what if you want your 3D characters to move in new and unique ways? GenMM is able to generate a variety of movements from just a single or few example sequences. Unlike other methods, it doesn’t need exhaustive training and can create new motions with complex skeletons in fractions of a second. It’s also a whiz at jobs you couldn’t do with motion matching alone, like motion completion, guided generation from keyframes, infinite looping, and motion reassembly.

01.06.23 · Project Page · Code · 3D Object Generation

Humans in 4D

[Humans in 4D] can track and reconstruct humans in 3D from a single video. It handles unusual poses and poor visibility well, using a transformer-based network called HMR 2.0 to improve action recognition.

31.05.23 · Project Page · Code · 3D Object Generation · Motion Capture · 3D Scene Generation

RAPHAEL

There is a new text-to-image player called RAPHAEL in town. The model aims to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is all great, but only if someone actually releases the model for open-source consumption as the community is craving a model that can achieve Midjourney quality.

29.05.23 · Project Page · Code · Text-to-Image

Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers

Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers can enhance low-resolution license plate images. It uses attention and transformer modules to improve details and a special loss function based on Optical Character Recognition to achieve better image quality.

27.05.23 · Code · Image Upscaling

Break-A-Scene

Break-A-Scene can extract multiple concepts from a single image using segmentation masks. It allows users to re-synthesize individual concepts or combinations in different contexts, enhancing scene generation with a two-phase customization process.

25.05.23 · Project Page · Code · Image Segmentation · Image Editing · Controllable Image Generation

Voyager

Voyager can explore the Minecraft world on its own and learn new skills. It uses an automatic curriculum to improve exploration and achieves 3.3 times more unique items and 15.3 times faster tech tree mastery compared to previous methods.

25.05.23 · Project Page · Code · Text-to-Text · Text Analysis

Sin3DM

Sin3DM can generate high-quality variations of 3D objects from a single textured shape. It uses a diffusion model to learn how parts of the object fit together, enabling retargeting, outpainting, and local editing.

24.05.23 · Project Page · Code · 3D Object Generation · 3D Editing · 3D Outpainting