Controllable Image Generation
Free controllable image generation AI tools for creating customizable visuals, helping artists and designers produce tailored images for projects.
ControlAR adds controls like edges, depths, and segmentation masks to autoregressive models like LlamaGen.
CtrLoRA can adapt a base ControlNet for image generation with just 1,000 data pairs in under one hour of training on a single GPU. It reduces learnable parameters by 90%, making it much easier to create new guidance conditions.
LinFusion can generate high-resolution images up to 16K in just one minute using a single GPU. It improves performance on various Stable Diffusion versions and works with pre-trained components like ControlNet and IP-Adapter.
SEG improves image generation for SDXL by smoothing the self-attention energy landscape! This boosts quality without needing guidance scale, using a query blurring method that adjusts attention weights, leading to better results with fewer drawbacks.
Artist stylizes images based on text prompts, preserving the original content while producing high aesthetic quality results. No finetuning, no ControlNets, it just works with your pretrained StableDiffusion model.
AccDiffusion can generate high-resolution images with fewer object repetition! Something Stable Diffusion has been plagued by since its infancy.
PartCraft can generate customized and photorealistic virtual creatures by mixing visual parts from existing images. This tool allows users to create unique hybrids and make detailed changes, which is useful for digital asset creation and studying biodiversity.
DMD2 is a new improved distillation method that can turn diffusion models into efficient one-step image generators.
Parts2Whole can generate customized human portraits from multiple reference images, including pose images and various aspects of human appearance. The method is able to generate human images conditioned on selected parts from different humans as control conditions, allowing you to create images with specific combinations of facial features, hair, clothes, etc.
Desigen can generate high-quality design templates, including background images and layout elements. It uses advanced diffusion models for better control and has been tested on over 40,000 advertisement banners, achieving results similar to human designers.
Multi-LoRA Composition focuses on the integration of multiple Low-Rank Adaptations (LoRAs) to create highly customized and detailed images. The approach is able to generate images with multiple elements without fine-tuning and without losing detail or image quality.
AmbiGen can generate ambigrams by optimizing letter shapes for clear reading from two angles. It improves word accuracy by over 11.6% and reduces edit distance by 41.9% on the 500 most common English words.
It’s been a while since I last doomed the TikTok dancers. MagicDance is gonna doom them some more. This model can combine human motion with reference images to precisely generate appearance-consistent videos. While the results still contain visible artifacts and jittering, give it a few months and I’m sure we can’t tell the difference no more.
Break-A-Scene can extract multiple concepts from a single image using segmentation masks. It allows users to re-synthesize individual concepts or combinations in different contexts, enhancing scene generation with a two-phase customization process.
[Tool Name] can [main function/capability]. It [key detail 1] and [key detail 2].
MultiDiffusion can generate high-quality images using a pre-trained text-to-image diffusion model. It allows users to control aspects like image size and includes features for guiding images with segmentation masks and bounding boxes.
ControlNet can add control to text-to-image diffusion models. It lets users manipulate image generation using methods like edge detection and depth maps, while working well with both small and large datasets.
StyleGAN-T can generate high-quality images at 512x512 resolution in just 2 seconds using a single NVIDIA A100 GPU. It solves problems in text-to-image synthesis, like stable training on diverse datasets and strong text alignment.