AI Toolbox · Image

Text-to-Image

Free text-to-image AI tools for creating visuals from text prompts, perfect for artists and designers in need of unique imagery.

Image AI Tools

3D Editing 3D Object Generation 3D Scene Generation Brain-to-Image Controllable Image Generation Image Captioning Image Classification Image Colorization Image Depth Estimation Image Editing Image Editing Controllable Image Generation Image Style Transfer Image Generation Image Inpainting Image Inpainting Image Editing Image Object Detection Image Relighting Image Restoration Image Segmentation Image Style Transfer Image-to-3D Image-to-Depth Image-to-Image Image-to-Sketch Image-to-Text Image-to-Video Image Upscaling Personalized Image Generation Text-to-Image Text-to-Image Personalized Image Generation Video Captioning Video Editing Virtual Image Try-On

InstaFlow

InstaFlow can generate high-quality images in just one step, achieving an FID of 23.3 on MS COCO 2017-5k. It works very fast at about 0.09 seconds per image, using much less computing power than traditional diffusion models.

12.09.23 · Code · Text-to-Image

IP-Adapter

Similar to ControlNet and Composer, IP-Adapter is a mutli-modal guidance adapter for image prompts which works with Stable Diffusion models trained on the same base model. The results look amazing.

13.08.23 · Project Page · Code · Text-to-Image

AnimateDiff

AnimateDiff is a new framework that brings video generation to the Stable Diffusion pipeline. Meaning you can generate videos with any already existing Stable Diffusion models without having to fine-tune or train anything. Pretty amazing. @DigThatData put together a Google Colab notebook in case you want to give it a try.

10.07.23 · Project Page · Code · Demo · Text-to-Image · Image-to-Video

Text-Guided Synthesis of Eulerian Cinemagraphs

Text2Cinemagraph can create cinemagraphs from text descriptions, animating elements like flowing rivers and drifting clouds. It combines artistic images with realistic ones to accurately show motion, outperforming other methods in generating cinemagraphs for natural and artistic scenes.

06.07.23 · Project Page · Code · Text-to-Image · Image Editing

DiffSketcher

DiffSketcher is a tool that can turn words into vectorized free-hand sketches. The method also supports the ability to define the level of abstraction, allowing for more abstract or concrete generations.

26.06.23 · Project Page · Code · Text-to-Image

Cocktail

Cocktail is a pipeline for guiding image generating. Compared to ControlNet, it only requires one generalized model for multiple modalities like Edge, Pose and Mask guidance.

01.06.23 · Project Page · Code · Text-to-Image

RAPHAEL

There is a new text-to-image player called RAPHAEL in town. The model aims to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is all great, but only if someone actually releases the model for open-source consumption as the community is craving a model that can achieve Midjourney quality.

29.05.23 · Project Page · Code · Text-to-Image

FastComposer

FastComposer can generate personalized images of multiple unseen individuals in various styles and actions without fine-tuning. It is 300x-2500x faster than traditional methods and requires no extra storage for new subjects, using subject embeddings and localized attention to keep identities clear.

17.05.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

InstantBooth

What if you could generate images from an untrained concept by providing a few images and without having to fine-tune a model first? InstantBooth from Adobe might be the answer. The novel approach is built upon pre-trained text-to-image models that enables instant text-guided image personalization without finetuning. Compared to methods like DreamBooth and Textual-Inversion, InstantBooth model can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster. Wen open-source?

10.05.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

Expressive Text-to-Image Generation with Rich Text

Expressive Text-to-Image Generation with Rich Text can create detailed images from text by using rich text formatting like font style, size, and color. This method allows for better control over styles and colors, making it easier to generate complex scenes compared to regular text.

13.04.23 · Project Page · Code · Demo · Text-to-Image

Inst-Inpaint

Inst-Inpaint can remove objects from images using natural language instructions, which saves time by not needing binary masks. It uses a new dataset called GQA-Inpaint, improving the quality and accuracy of image inpainting significantly.

06.04.23 · Project Page · Code · Image Inpainting · Text-to-Image

eDiff-I

eDiff-I can generate high-resolution images from text prompts using different diffusion models for each stage. It also allows users to control image creation by selecting and moving words on a canvas.

24.03.23 · Code · Text-to-Image

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models can quickly personalize text-to-image models using just one image and only 5 training steps. This method reduces training time from minutes to seconds while maintaining quality through regularized weight-offsets.

23.02.23 · Code · Text-to-Image

Reduce, Reuse, Recycle

Reduce, Reuse, Recycle can enable compositional generation using energy-based diffusion models and MCMC samplers. It improves tasks like classifier-guided ImageNet modeling and text-to-image generation by introducing new samplers that enhance performance.

22.02.23 · Project Page · Code · Text-to-Image · Image Classification

T2I-Adapter

[Tool Name] can [main function/capability]. It [key detail 1] and [key detail 2].

16.02.23 · Code · Demo · Text-to-Image · Image Editing · Controllable Image Generation

Adding Conditional Control to Text-to-Image Diffusion Models

ControlNet can add control to text-to-image diffusion models. It lets users manipulate image generation using methods like edge detection and depth maps, while working well with both small and large datasets.

10.02.23 · Code · Text-to-Image · Controllable Image Generation

StyleGAN-T

StyleGAN-T can generate high-quality images at 512x512 resolution in just 2 seconds using a single NVIDIA A100 GPU. It solves problems in text-to-image synthesis, like stable training on diverse datasets and strong text alignment.

23.01.23 · Project Page · Code · Text-to-Image · Controllable Image Generation

VectorFusion

VectorFusion can generate SVG-exportable vector graphics from text prompts. It uses a text-conditioned diffusion model to create high-quality outputs in various styles, like pixel art and sketches, without needing large datasets of captioned SVGs.

21.11.22 · Project Page · Code · Text-to-Image