AI Toolbox · Image

Text-to-Image

Free text-to-image AI tools for creating visuals from text prompts, perfect for artists and designers in need of unique imagery.

Image AI Tools

3D Editing 3D Object Generation 3D Scene Generation Brain-to-Image Controllable Image Generation Image Captioning Image Classification Image Colorization Image Depth Estimation Image Editing Image Editing Controllable Image Generation Image Style Transfer Image Generation Image Inpainting Image Inpainting Image Editing Image Object Detection Image Relighting Image Restoration Image Segmentation Image Style Transfer Image-to-3D Image-to-Depth Image-to-Image Image-to-Sketch Image-to-Text Image-to-Video Image Upscaling Personalized Image Generation Text-to-Image Text-to-Image Personalized Image Generation Video Captioning Video Editing Virtual Image Try-On

InstantStyle

InstantStyle can separate style and content from images in text-to-image generation without tuning. It improves visual style by using features from reference images while keeping text control and preventing style leaks.

03.04.24 · Code · Text-to-Image

CosmicMan

CosmicMan can generate high-quality, photo-realistic human images that match text descriptions closely. It uses a unique method called Annotate Anyone and a training framework called Decomposed-Attention-Refocusing (Daring) to improve the connection between text and images.

01.04.24 · Project Page · Code · Text-to-Image

Getting it Right

Following spatial instructions in text-to-image prompts is hard! SPRIGHT-T2I can finally do it though, resulting in more coherent and accurate compositions.

01.04.24 · Project Page · Code · Text-to-Image

AID

PAID is a method that enables smooth high consistency image interpolation for diffusion models. GANs have been the king in that field so far, but this method shows promising results for diffusion models.

26.03.24 · Project Page · Code · Text-to-Image

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Attribute Control enables fine-grained control over attributes of specific subjects in text-to-image models. This lets you modify attributes like age, width, makeup, smile and more for each subject independently.

25.03.24 · Project Page · Code · Text-to-Image

FouriScale

FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.

19.03.24 · Code · Text-to-Image · Image Restoration · Image Upscaling

You Only Sample Once

You Only Sample Once can quickly create high-quality images from text in one step. It combines diffusion processes with GANs, allows fine-tuning of pre-trained models, and works well at higher resolutions without extra training.

19.03.24 · Code · Text-to-Image

DEADiff

DEADiff can synthesize images that combine the style of a reference image with text prompts. It uses a Q-Former mechanism to separate style and meaning.

11.03.24 · Project Page · Code · Text-to-Image · Image Style Transfer

ELLA

ELLA is a lightweight approach to equip existing CLIP-based diffusion models with LLMs to improve prompt-understanding and enables long dense text comprehension for text-to-image models.

08.03.24 · Project Page · Code · Text-to-Image · Image Editing

PixArt-Σ

The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.

07.03.24 · Project Page · Code · Text-to-Image

Visual Style Prompting with Swapping Self-Attention

Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.

20.02.24 · Project Page · Code · Text-to-Image · Image Style Transfer

Learning Continuous 3D Words for Text-to-Image Generation

Continuous 3D Words is a control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.

13.02.24 · Project Page · Code · Text-to-Image · Image Editing

FlexGen

FlexGen can generate high-quality, multi-view images from a single-view image or text prompt. It lets users change unseen areas and adjust material properties like metallic and roughness, improving control over the final image.

12.01.24 · Project Page · Code · Text-to-Image · Image-to-Image · Controllable Image Generation

PIA

PIA is a method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.

21.12.23 · Project Page · Code · Text-to-Image · Image-to-Video

ControlNet-XS

ControlNet-XS can control text-to-image diffusion models like Stable Diffusion and Stable Diffusion-XL with only 1% of the parameters of the base model. It is about twice as fast as ControlNet and produces higher quality images with better control.

11.12.23 · Project Page · Code · Text-to-Image · Image Editing

Readout Guidance

Readout Guidance can control text-to-image diffusion models using lightweight networks called readout heads. It enables pose, depth, and edge-guided generation with fewer parameters and training samples, allowing for easier manipulation and consistent identity generation.

04.12.23 · Project Page · Code · Text-to-Image · Image Editing

X-Adapter

X-Adapter can enable pretrained plugins like ControlNet and LoRA from Stable Diffusion 1.5 to work with the SDXL model without retraining. It adds trainable mapping layers for feature remapping and uses a null-text training strategy to improve compatibility and functionality.

04.12.23 · Project Page · Code · Text-to-Image

Multi-Concept Customization of Text-to-Image Diffusion

Custom Diffusion can quickly fine-tune text-to-image diffusion models to generate new variations from just a few examples in about 6 minutes on 2 A100 GPUs. It allows for the combination of multiple concepts and requires only 75MB of storage for each additional model, which can be compressed to 5-15MB.

03.12.23 · Project Page · Code · Text-to-Image · Personalized Image Generation

The Chosen One

[The Chosen One] can generate consistent characters in text-to-image diffusion models using just a text prompt. It improves character identity and prompt alignment, making it useful for story visualization, game development, and advertising.

16.11.23 · Project Page · Code · Text-to-Image

Latent Consistency Models

Latent Consistency Models can generate high-resolution images in just 2-4 steps, making text-to-image generation much faster than traditional methods. They require only 32 A100 GPU hours for training on a 768x768 resolution, which is efficient for high-quality results.

06.10.23 · Project Page · Code · Text-to-Image