AI Toolbox

Image AI Tools

Free image AI tools for generating and editing visuals, creating 3D assets for games, films, and more, optimizing your creative projects.

Image AI Tools

3D Editing 3D Object Generation 3D Scene Generation Brain-to-Image Controllable Image Generation Image Captioning Image Classification Image Colorization Image Depth Estimation Image Editing Image Editing Controllable Image Generation Image Style Transfer Image Generation Image Inpainting Image Inpainting Image Editing Image Object Detection Image Relighting Image Restoration Image Segmentation Image Style Transfer Image-to-3D Image-to-Depth Image-to-Image Image-to-Sketch Image-to-Text Image-to-Video Image Upscaling Personalized Image Generation Text-to-Image Text-to-Image Personalized Image Generation Video Captioning Video Editing Virtual Image Try-On

PixArt-Σ

The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.

07.03.24 · Project Page · Code · Text-to-Image

ResAdapter

ResAdapter can generate images with any resolution and aspect ratio for diffusion models. It works with various personalized models and processes images efficiently, using only 0.5M parameters while keeping the original style.

04.03.24 · Project Page · Code · Image Upscaling · Image Editing · Image Restoration

Trajectory Consistency Distillation

While LCM and Turbo have unlocked near real-time image diffusion, the quality is still a bit lacking. TCD on the other hand manages to generate images with both clarity and detailed intricacy without compromising on speed.

29.02.24 · Project Page · Code · Image Restoration · Image Classification

OHTA

OHTA can create detailed and usable hand avatars from just one image. It allows for text-to-avatar conversion and editing of hand textures and shapes, using data-driven hand priors to improve accuracy with limited input.

29.02.24 · Project Page · Code · Image-to-3D · Image Editing

Multi-LoRA Composition for Image Generation

Multi-LoRA Composition focuses on the integration of multiple Low-Rank Adaptations (LoRAs) to create highly customized and detailed images. The approach is able to generate images with multiple elements without fine-tuning and without losing detail or image quality.

26.02.24 · Project Page · Code · Controllable Image Generation · Image Editing

Visual Style Prompting with Swapping Self-Attention

Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.

20.02.24 · Project Page · Code · Text-to-Image · Image Style Transfer

Learning Continuous 3D Words for Text-to-Image Generation

Continuous 3D Words is a control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.

13.02.24 · Project Page · Code · Text-to-Image · Image Editing

Repositioning the Subject within Image

SEELE can move around objects within an image. It does so by removing it, inpainting occluded portions and harmonizing the appearance of the repositioned object with the surrounding areas.

30.01.24 · Project Page · Code · Image Inpainting · Image Editing

StableIdentity

StableIdentity is a method that can generate diverse customized images in various contexts from a single input image. The cool thing about this method is, that it is able to combine the learned identity with ControlNet and even inject it into video (ModelScope) and 3D (LucidDreamer) generation.

29.01.24 · Project Page · Code · Personalized Image Generation · Image Editing

pix2gestalt

pix2gestalt is able to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.

25.01.24 · Project Page · Code · Image Segmentation · Image Restoration · Image Object Detection

Depth Anything

Depth Anything is a new monocular depth estimation method. The model is trained on 1.5M labeled images and 62M+ unlabeled images, which results in impressive generalization ability.

19.01.24 · Project Page · Code · Image-to-Depth

InstantID

InstantID is a ID embedding-based method that can be used to personalize images in various styles using just a single facial image, while ensuring high fidelity.

15.01.24 · Code · Personalized Image Generation · Image Editing · Image Style Transfer

PIA

PIA is a method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.

21.12.23 · Project Page · Code · Text-to-Image · Image-to-Video

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion can generate detailed albedo, roughness, and metallic maps from a single indoor scene image.

19.12.23 · Project Page · Code · Image-to-Image

DiffusionLight

DiffusionLight can estimate the lighting in a single input image and convert it into an HDR environment map. The technique is able to generate multiple chrome balls with varying exposures for HDR merging and can be used to seamlessly insert 3D objects into an existing photograph. Pretty cool.

14.12.23 · Project Page · Code · Image Restoration

ControlNet-XS

ControlNet-XS can control text-to-image diffusion models like Stable Diffusion and Stable Diffusion-XL with only 1% of the parameters of the base model. It is about twice as fast as ControlNet and produces higher quality images with better control.

11.12.23 · Project Page · Code · Text-to-Image · Image Editing

PhotoMaker

PhotoMaker can generate realistic human photos from input images and text prompts. It can change attributes of people, like changing hair colour and adding glasses, turn people from artworks like Van Gogh’s self-portrait into realistic photos, or mix identities of multiple people.

07.12.23 · Project Page · Code · Text-to-Image Personalized Image Generation

DPM-Solver

DPM-Solver can generate high-quality samples from diffusion probabilistic models in just 10 to 20 function evaluations. It is 4 to 16 times faster than previous methods and works with both discrete-time and continuous-time models without extra training.

06.12.23 · Code · Image Classification

AmbiGen

AmbiGen can generate ambigrams by optimizing letter shapes for clear reading from two angles. It improves word accuracy by over 11.6% and reduces edit distance by 41.9% on the 500 most common English words.

05.12.23 · Project Page · Code · Image Editing · Controllable Image Generation

Readout Guidance

Readout Guidance can control text-to-image diffusion models using lightweight networks called readout heads. It enables pose, depth, and edge-guided generation with fewer parameters and training samples, allowing for easier manipulation and consistent identity generation.

04.12.23 · Project Page · Code · Text-to-Image · Image Editing