Image AI Tools
Free image AI tools for generating and editing visuals, creating 3D assets for games, films, and more, optimizing your creative projects.
[Reference-based Image Composition with Sketch via Structure-aware Diffusion Model] can edit images by filling in missing parts using a reference image and a sketch. This method improves editability and allows for detailed changes in various scenes.
PAIR Diffusion is a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in an image. This allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
LDMs are high-resolution image generators that can inpaint, generate images from text or bounding boxes, and do super-resolution.
eDiff-I can generate high-resolution images from text prompts using different diffusion models for each stage. It also allows users to control image creation by selecting and moving words on a canvas.
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models can quickly personalize text-to-image models using just one image and only 5 training steps. This method reduces training time from minutes to seconds while maintaining quality through regularized weight-offsets.
Reduce, Reuse, Recycle can enable compositional generation using energy-based diffusion models and MCMC samplers. It improves tasks like classifier-guided ImageNet modeling and text-to-image generation by introducing new samplers that enhance performance.
Entity-Level Text-Guided Image Manipulation can edit specific parts of an image based on text descriptions while keeping other areas unchanged. It uses a two-step process for aligning meanings and making changes, allowing for flexible and precise editing.
[Tool Name] can [main function/capability]. It [key detail 1] and [key detail 2].
MultiDiffusion can generate high-quality images using a pre-trained text-to-image diffusion model. It allows users to control aspects like image size and includes features for guiding images with segmentation masks and bounding boxes.
ControlNet can add control to text-to-image diffusion models. It lets users manipulate image generation using methods like edge detection and depth maps, while working well with both small and large datasets.
Neural Congealing can align similar content across multiple images using a self-supervised method. It uses pre-trained DINO-ViT features to create a shared semantic map, allowing for effective alignment even with different appearances and backgrounds.
Pix2Pix-Zero can edit images by changing them in real-time, like turning a cat into a dog, without needing extra text prompts or training. It keeps the original image’s structure and uses pre-trained text-to-image diffusion models for better editing results.
StyleGAN-T can generate high-quality images at 512x512 resolution in just 2 seconds using a single NVIDIA A100 GPU. It solves problems in text-to-image synthesis, like stable training on diverse datasets and strong text alignment.
CLIPascene can convert scene images into sketches with different levels of detail and simplicity. Users can create a range of sketches, from detailed to simple, allowing for personalized artistic expression.
VectorFusion can generate SVG-exportable vector graphics from text prompts. It uses a text-conditioned diffusion model to create high-quality outputs in various styles, like pixel art and sketches, without needing large datasets of captioned SVGs.
InstructPix2Pix can edit images based on written instructions. It allows users to add or remove objects, change colors, and transform styles quickly, using a conditional diffusion model trained on a large dataset.
MinD-Vis can create realistic images from brain recordings using a method that combines Sparse Masked Brain Modeling and a Double-Conditioned Latent Diffusion Model. It achieves top performance in understanding thoughts and generating images, surpassing previous results by 66% in semantic mapping and 41% in image quality, while needing very few paired examples.
UnZipLoRA can break down an image into its subject and style. This makes it possible to create variations and apply styles to new subjects.
SDEdit can generate and edit photo-realistic images using user-guided inputs like hand-drawn strokes or text prompts. It outperforms GAN-based methods, achieving high scores in realism and overall satisfaction without needing specific training.
[Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries] can retrieve high-quality sound effects from a single video frame without needing text metadata. It uses a combination of large language models and contrastive learning to match sound effects to video better than existing methods.