Image AI Tools
Free image AI tools for generating and editing visuals, creating 3D assets for games, films, and more, optimizing your creative projects.
Intrinsic Image Diffusion can generate detailed albedo, roughness, and metallic maps from a single indoor scene image.
DiffusionLight can estimate the lighting in a single input image and convert it into an HDR environment map. The technique is able to generate multiple chrome balls with varying exposures for HDR merging and can be used to seamlessly insert 3D objects into an existing photograph. Pretty cool.
ControlNet-XS can control text-to-image diffusion models like Stable Diffusion and Stable Diffusion-XL with only 1% of the parameters of the base model. It is about twice as fast as ControlNet and produces higher quality images with better control.
PhotoMaker can generate realistic human photos from input images and text prompts. It can change attributes of people, like changing hair colour and adding glasses, turn people from artworks like Van Gogh’s self-portrait into realistic photos, or mix identities of multiple people.
DPM-Solver can generate high-quality samples from diffusion probabilistic models in just 10 to 20 function evaluations. It is 4 to 16 times faster than previous methods and works with both discrete-time and continuous-time models without extra training.
AmbiGen can generate ambigrams by optimizing letter shapes for clear reading from two angles. It improves word accuracy by over 11.6% and reduces edit distance by 41.9% on the 500 most common English words.
Readout Guidance can control text-to-image diffusion models using lightweight networks called readout heads. It enables pose, depth, and edge-guided generation with fewer parameters and training samples, allowing for easier manipulation and consistent identity generation.
X-Adapter can enable pretrained plugins like ControlNet and LoRA from Stable Diffusion 1.5 to work with the SDXL model without retraining. It adds trainable mapping layers for feature remapping and uses a null-text training strategy to improve compatibility and functionality.
Custom Diffusion can quickly fine-tune text-to-image diffusion models to generate new variations from just a few examples in about 6 minutes on 2 A100 GPUs. It allows for the combination of multiple concepts and requires only 75MB of storage for each additional model, which can be compressed to 5-15MB.
DiffusionMat is a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. The key innovation of the framework is a correction module that adjusts the output at each denoising step, ensuring that the final result is consistent with the input image’s structures.
Material Palette can extract a palette of PBR materials (albedo, normals, and roughness) from a single real-world image. Looks very useful for creating new materials for 3D scenes or even for generating textures for 2D art.
Concept Sliders is a method that allows for fine-grained control over textual and visual attributes in Stable Diffusion XL. By using simple text descriptions or a small set of paired images, artists can train concept sliders to represent the direction of desired attributes. At generation time, these sliders can be used to control the strength of the concept in the image, enabling nuanced tweaking.
It’s been a while since I last doomed the TikTok dancers. MagicDance is gonna doom them some more. This model can combine human motion with reference images to precisely generate appearance-consistent videos. While the results still contain visible artifacts and jittering, give it a few months and I’m sure we can’t tell the difference no more.
[The Chosen One] can generate consistent characters in text-to-image diffusion models using just a text prompt. It improves character identity and prompt alignment, making it useful for story visualization, game development, and advertising.
Object-aware Inversion and Reassembly can edit multiple objects in an image by finding the best steps for each edit. It allows for precise changes in shapes, colors, and materials while keeping the rest of the image intact.
HyperHuman is a text-to-image model that focuses on generating hyper-realistic human images from text prompts and a pose image. The results are pretty impressive and the model is able to generate images in different styles and up to a resolution of 1024x1024.
ScaleCrafter can generate ultra-high-resolution images up to 4096x4096 and videos at 2048x1152 using pre-trained diffusion models. It reduces problems like object repetition and allows for custom aspect ratios, achieving excellent texture detail.
Uni-paint can perform image inpainting using different methods like text, strokes, and examples. It uses a pretrained Stable Diffusion model, allowing it to adapt to new images without extra training.
Latent Consistency Models can generate high-resolution images in just 2-4 steps, making text-to-image generation much faster than traditional methods. They require only 32 A100 GPU hours for training on a 768x768 resolution, which is efficient for high-quality results.
DA-CLIP is a method that can be used to restore images. Apart from inpainting, the method is able to restore images by dehazing, deblurring, denoising, derainining and desnowing them as well as removing unwanted shadows and raindrops or enhance lighting on low-light images.