Image AI Tools
Free image AI tools for generating and editing visuals, creating 3D assets for games, films, and more, optimizing your creative projects.
DWPose is a post estimator that uses a two-stage distillation approach to improve the accuracy of the pose estimation.
Interpolating between Images with Diffusion Models can generate smooth transitions between two images using latent diffusion models. It allows for high-quality results across different styles and subjects while using CLIP to select the best images for interpolation.
FABRIC can condition diffusion models on feedback images to improve image quality. This method allows users to personalize content through multiple feedback rounds without needing training.
AnimateDiff is a new framework that brings video generation to the Stable Diffusion pipeline. Meaning you can generate videos with any already existing Stable Diffusion models without having to fine-tune or train anything. Pretty amazing. @DigThatData put together a Google Colab notebook in case you want to give it a try.
Text2Cinemagraph can create cinemagraphs from text descriptions, animating elements like flowing rivers and drifting clouds. It combines artistic images with realistic ones to accurately show motion, outperforming other methods in generating cinemagraphs for natural and artistic scenes.
CSD-Edit is a multi modality editing approach that compared to other methods works great on images bigger than the traditional 512x512 limitation and can edit 4k or large panorama images, has improved temporal consistency on video frames as well as improved view consistency when editing or generating 3D scenes.
DreamDiffusion can generate high-quality images from brain EEG signals without needing to translate thoughts into text. It uses pre-trained text-to-image models and special techniques to handle noise and individual differences, making it a key step towards affordable thoughts-to-image technology.
DiffSketcher is a tool that can turn words into vectorized free-hand sketches. The method also supports the ability to define the level of abstraction, allowing for more abstract or concrete generations.
Diffusion with Forward Models is a able to reconstruct 3D scenes from a single input image. Additionally it’s also able to add small and short motions to images with people in them.
Cocktail is a pipeline for guiding image generating. Compared to ControlNet, it only requires one generalized model for multiple modalities like Edge, Pose and Mask guidance.
There is a new text-to-image player called RAPHAEL in town. The model aims to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is all great, but only if someone actually releases the model for open-source consumption as the community is craving a model that can achieve Midjourney quality.
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers can enhance low-resolution license plate images. It uses attention and transformer modules to improve details and a special loss function based on Optical Character Recognition to achieve better image quality.
Break-A-Scene can extract multiple concepts from a single image using segmentation masks. It allows users to re-synthesize individual concepts or combinations in different contexts, enhancing scene generation with a two-phase customization process.
DragGAN can manipulate images by letting users drag points to change the pose, shape, and layout of objects. It produces realistic results even when parts of the image are hidden or deformed.
FastComposer can generate personalized images of multiple unseen individuals in various styles and actions without fine-tuning. It is 300x-2500x faster than traditional methods and requires no extra storage for new subjects, using subject embeddings and localized attention to keep identities clear.
Ray Conditioning is a lightweight and geometry-free technique for multi-view image generation. You have that perfect portrait shot of a face but the angle is not right? No problem, just use that shot as an input image and generate the portrait from a another angle. Done.
Improved Diffusion-based Image Colorization via Piggybacked Models can colorize grayscale images using knowledge from pre-trained Text-to-Image diffusion models. It allows for conditional colorization with user hints and text prompts, achieving high-quality results.
DiFaReli can relight single-view face images by managing lighting effects like shadows and global illumination. It uses a conditional diffusion model to separate lighting information, achieving photorealistic results without needing 3D data.
Expressive Text-to-Image Generation with Rich Text can create detailed images from text by using rich text formatting like font style, size, and color. This method allows for better control over styles and colors, making it easier to generate complex scenes compared to regular text.
Inst-Inpaint can remove objects from images using natural language instructions, which saves time by not needing binary masks. It uses a new dataset called GQA-Inpaint, improving the quality and accuracy of image inpainting significantly.