AI Toolbox | AI Art Weekly

Point-E

Point-E can generate 3D point clouds from text prompts in 1-2 minutes on a single GPU. It uses a text-to-image diffusion model to create a view and then a second diffusion model to produce the point cloud, offering a faster option for 3D object generation.

20.12.22 · Code · Demo · Text-to-3D

MAGVIT

MAGVIT can perform video synthesis tasks like inpainting, outpainting, and generating animations from single images. It is much faster than other models, working 100 times quicker than diffusion models and 60 times faster than autoregressive models, while also achieving the best results on multiple benchmarks.

10.12.22 · Project Page · Code · Video Inpainting · Video Outpainting · Video Editing

CLIPascene

CLIPascene can convert scene images into sketches with different levels of detail and simplicity. Users can create a range of sketches, from detailed to simple, allowing for personalized artistic expression.

30.11.22 · Project Page · Code · Image-to-Sketch

3D Neural Field Generation using Triplane Diffusion

3D Neural Field Generation using Triplane Diffusion can create high-quality 3D models from 2D images. It uses a diffusion model to turn ShapeNet meshes into continuous occupancy fields, achieving top results in 3D generation for various object types.

30.11.22 · Project Page · Code · 3D Object Generation · 3D Scene Generation

TextureDreamer

TextureDreamer can transfer detailed textures from just 3 to 5 images to any 3D shape. It uses a method called geometry-aware score distillation to improve texture quality beyond previous techniques.

27.11.22 · Project Page · Code · 3D Texture Generation

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Latent-NeRF can generate 3D shapes and textures by combining text and shape guidance. It uses latent score distillation to apply this guidance directly on 3D meshes, allowing for high-quality textures on specific geometries.

27.11.22 · Code · Text-to-3D · 3D Object Generation

VectorFusion

VectorFusion can generate SVG-exportable vector graphics from text prompts. It uses a text-conditioned diffusion model to create high-quality outputs in various styles, like pixel art and sketches, without needing large datasets of captioned SVGs.

21.11.22 · Project Page · Code · Text-to-Image

InstructPix2Pix

InstructPix2Pix can edit images based on written instructions. It allows users to add or remove objects, change colors, and transform styles quickly, using a conditional diffusion model trained on a large dataset.

17.11.22 · Project Page · Code · Image Editing

Seeing Beyond the Brain

MinD-Vis can create realistic images from brain recordings using a method that combines Sparse Masked Brain Modeling and a Double-Conditioned Latent Diffusion Model. It achieves top performance in understanding thoughts and generating images, surpassing previous results by 66% in semantic mapping and 41% in image quality, while needing very few paired examples.

13.11.22 · Project Page · Code · Brain-to-Image

I Hear Your True Colors

I Hear Your True Colors: Image Guided Audio Generation can generate audio that matches images using a two-stage Transformer model. It produces high-quality sound and introduces the ImageHear dataset for testing future image-to-audio models.

06.11.22 · Project Page · Code · Image-to-Audio

LESS

One-2-3-45 can generate a complete 360-degree 3D textured mesh from a single image in just 45 seconds. It uses a view-conditioned 2D diffusion model to create multiple images, resulting in better geometry and consistency than other methods.

14.10.22 · Project Page · Code · 3D Segmentation

MotionBERT

MotionBERT can recover 3D human motion from noisy 2D observations. It excels in 3D pose estimation, action recognition, and motion prediction, achieving the lowest pose estimation error when trained from scratch.

12.10.22 · Project Page · Code · Video Scene Detection

EVA3D

EVA3D can generate high-quality 3D human models from 2D image collections. It uses a method called compositional NeRF for detailed shapes and textures, and it improves learning with pose-guided sampling.

10.10.22 · Code · Image-to-3D · 3D Avatar Generation

VToonify

VToonify can create high-quality artistic portrait videos from images. It allows for controllable style transfer on non-aligned faces and produces smooth, coherent videos with flexible controls on color and intensity.

22.09.22 · Project Page · Code · Video Style Transfer · Video Editing

AudioLM

AudioLM can generate high-quality audio by treating it like a language task. It produces coherent speech and piano music continuations while keeping the speaker’s voice and style consistent, even for new speakers.

07.09.22 · Project Page · Code · Audio Outpainting / Continuation

Splatter Image

Splatter Image can reconstruct a 4D video from a single image at 38 frames per second and render them at 588 frames per second.

01.08.22 · Project Page · Code · Image-to-4D

ARF

ARF: Artistic Radiance Fields can transfer the style of a 2D image to a 3D scene by stylizing radiance fields. It captures style details while ensuring that different views of the scene look consistent, resulting in high-quality 3D content that closely matches the original style image.

22.07.22 · Project Page · Code · 3D Style Transfer

MCVD

MCVD can generate videos and predict future and past frames using a masked conditional score-based diffusion model. It achieves high quality and diversity in generated frames, excelling in various video synthesis tasks.

19.05.22 · Project Page · Code · Video Prediction · Video Generation · Video Interpolation

LRM

Adobe is entering the image-to-3D game. LRM can create high-fidelity 3D object meshes from a single image in just 5 seconds. The model is trained on massive multi-view data containing around 1 million objects. The results are pretty impressive and the method is able to generalize well to real-world pictures and images from generative models.

15.02.22 · Project Page · Code · Image-to-3D

InseRF

Even though Gaussian Splats have seen a lot of love, NeRFs haven’t been abandoned. This week we got three different NeRF editing papers. The first two are about inpainting. InseRF and GO-NeRF are both methods to insert 3D objects into NeRF scenes.

15.02.22 · Project Page · Code · Text-to-3D · 3D Object Generation