Image AI Tools
Free image AI tools for generating and editing visuals, creating 3D assets for games, films, and more, optimizing your creative projects.
ReNoise can be used to reconstruct an input image that can be edited using text prompts.
FouriScale can generate high-resolution images from pre-trained diffusion models with various aspect ratios and achieve an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
You Only Sample Once can quickly create high-quality images from text in one step. It combines diffusion processes with GANs, allows fine-tuning of pre-trained models, and works well at higher resolutions without extra training.
StyleSketch is a method for extracting high-resolution stylized sketches from a face image. Pretty cool!
Desigen can generate high-quality design templates, including background images and layout elements. It uses advanced diffusion models for better control and has been tested on over 40,000 advertisement banners, achieving results similar to human designers.
DEADiff can synthesize images that combine the style of a reference image with text prompts. It uses a Q-Former mechanism to separate style and meaning.
ELLA is a lightweight approach to equip existing CLIP-based diffusion models with LLMs to improve prompt-understanding and enables long dense text comprehension for text-to-image models.
The PixArt model family got a new addition with PixArt-Σ. The model is capable of directly generating images at 4K resolution. Compared to its predecessor, PixArt-α, it offers images of higher fidelity and improved alignment with text prompts.
ResAdapter can generate images with any resolution and aspect ratio for diffusion models. It works with various personalized models and processes images efficiently, using only 0.5M parameters while keeping the original style.
While LCM and Turbo have unlocked near real-time image diffusion, the quality is still a bit lacking. TCD on the other hand manages to generate images with both clarity and detailed intricacy without compromising on speed.
OHTA can create detailed and usable hand avatars from just one image. It allows for text-to-avatar conversion and editing of hand textures and shapes, using data-driven hand priors to improve accuracy with limited input.
Multi-LoRA Composition focuses on the integration of multiple Low-Rank Adaptations (LoRAs) to create highly customized and detailed images. The approach is able to generate images with multiple elements without fine-tuning and without losing detail or image quality.
Visual Style Prompting can generate images with a specific style from a reference image. Compared to other methods like IP-Adapter and LoRAs, Visual Style Prompting is better at retainining the style of the referenced image while avoiding style leakage from text prompts.
Continuous 3D Words is a control method that can modify attributes in images with a slider based approach. This allows for more control over illumination, non-rigid shape changes (like wings), and camera orientation for instance.
SEELE can move around objects within an image. It does so by removing it, inpainting occluded portions and harmonizing the appearance of the repositioned object with the surrounding areas.
StableIdentity is a method that can generate diverse customized images in various contexts from a single input image. The cool thing about this method is, that it is able to combine the learned identity with ControlNet and even inject it into video (ModelScope) and 3D (LucidDreamer) generation.
pix2gestalt is able to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
Depth Anything is a new monocular depth estimation method. The model is trained on 1.5M labeled images and 62M+ unlabeled images, which results in impressive generalization ability.
InstantID is a ID embedding-based method that can be used to personalize images in various styles using just a single facial image, while ensuring high fidelity.
PIA is a method that can animate images generated by custom Stable Diffusion checkpoints with realistic motions based on a text prompt.