AI Toolbox | AI Art Weekly

RigAnything

RigAnything can automatically rig 3D assets by generating joints, skeletons, and skinning weights without templates. It supports any input pose and rigs shapes 20 times faster than other methods, taking under 2 seconds per shape.

22.08.24 · Project Page · Code · 3D Object Generation · 3D Animation

STA-V2A

STA-V2A can generate high-quality audio from videos by extracting important features and using text for guidance. It uses a Latent Diffusion Model for audio creation and a new metric called Audio-Audio Align to measure how well the audio matches the video timing.

20.08.24 · Project Page · Code · Text-to-Audio

TVG

TVG can create smooth transition videos between two images without needing training. It uses diffusion models and Gaussian Process Regression for high-quality results and adds controls for better timing.

19.08.24 · Project Page · Code · Video Generation

Iterative Object Count Optimization for Text-to-image Diffusion Models

Iterative Object Count Optimization can improve object counting accuracy in text-to-image diffusion models.

18.08.24 · Project Page · Code · Text-to-Image

SparseCraft

SparseCraft can reconstruct 3D shapes and appearances from just three colored images. It uses a Signed Distance Function (SDF) and a radiance field, achieving fast training times of under 10 minutes without needing pretrained models.

15.08.24 · Project Page · Code · 3D Object Generation · Image-to-3D

MagicFace

MagicFace can generate high-quality images of people in any style without needing extra training.

15.08.24 · Project Page · Code · Personalized Image Generation · Image Editing

MagicFace

MagicFace can generate high-quality images of people in any style without needing training. It uses special attention methods for precise attribute alignment and feature injection, working for both single and multi-concept customization.

15.08.24 · Project Page · Code · Personalized Image Generation · Image Editing

Generative Photomontage

Generative Photomontage can combine parts of multiple AI-generated images using a brush tool. It enables the creation of new appearance combinations, correct shapes and artifacts, and improve prompt alignment, outperforming existing image blending methods.

15.08.24 · Project Page · Code · Image Editing · Image Inpainting

Filtered-Guided Diffusion

Filtered Guided Diffusion shows that image-to-image translation and editing doesn’t necessarily require additional training. FGD simply applies a filter to the input of each diffusion step based on the output of the previous step in an adaptive manner which makes this approach easy to implement.

14.08.24 · Code · Image-to-Image · Image Editing

Matryoshka Diffusion Models

[Matryoshka Diffusion Models] can generate high-quality images and videos using a NestedUNet architecture that denoises inputs at different resolutions. This method allows for strong performance at resolutions up to 1024x1024 pixels and supports effective training without needing specific examples.

14.08.24 · Project Page · Code · Text-to-Video · Text-to-Image

DiffComplete

DiffComplete can complete 3D shapes from incomplete scans using a diffusion-based method.

13.08.24 · Project Page · Code · 3D Object Generation

Puppet-Master

Puppet-Master can create realistic motion in videos from a single image using simple drag controls. It uses a fine-tuned video diffusion model and all-to-first attention method to make high-quality videos.

09.08.24 · Project Page · Code · Image-to-Video

Generative Camera Dolly

Generative Camera Dolly can regenerate a video from any chosen perspective. Still very early, but imagine being able to change any shot or angle in a video after it’s been recorded!

07.08.24 · Project Page · Code · Video-to-Video

Fast Sprite Decomposition from Animated Graphics

Sprite-Decompose can break down animated graphics into sprites using videos and box outlines.

07.08.24 · Project Page · Code · Image Segmentation

MILS

MILS can generate captions for images, videos, and audio without any training. It achieves top performance in zero-shot captioning and improves text-to-image generation, allowing for creative uses across different media types.

06.08.24 · Code · Image Captioning · Video Captioning · Audio Captioning · Text-to-Image · Image-to-Image

IPAdapter-Instruct

IPAdapter-Instruct can efficiently combine natural-image conditioning with “Instruct” prompts! It enables users to switch between various interpretations of the same image, such as style transfer and object extraction.

06.08.24 · Project Page · Code · Image Style Transfer · Image Editing

MeshAvatar

MeshAvatar can generate high-quality triangular human avatars from multi-view videos. The avatars can be edited, manipulated, and relit.

06.08.24 · Project Page · Code · 3D Avatar Generation · Video-to-3D

MeshAnything V2

MeshAnything V2 can generate 3D meshes from point clouds, meshes, images, text and more.

06.08.24 · Project Page · Code · 3D Mesh Generation

Lumina-mGPT

Lumina-mGPT can create photorealistic images from text and handle different visual and language tasks! It uses a special transformer model, making it possible to control image generation, do segmentation, estimate depth, and answer visual questions in multiple steps.

06.08.24 · Code · Demo · Text-to-Image

Feature Splatting

And talking about Splats, Feature Splatting can manipulate both the appearance and the physical properties of objects in a 3D scene using text prompts.

05.08.24 · Project Page · Code · Text-to-3D · 3D Scene Generation