Image Segmentation
Free image segmentation AI tools for accurately identifying and isolating objects in images, streamlining your creative projects and visual content creation.
ZIM can generate precise matte masks from segmentation labels, enabling zero-shot image matting.
ControlAR adds controls like edges, depths, and segmentation masks to autoregressive models like LlamaGen.
SEMat can improve interactive image matting! It enhances network design and training to achieve better transparency, detail, and accuracy than methods like MAM and SmartMat.
Text2Place can place any human or object realistically into diverse backgrounds. This enables scene hallucination by generating compatible scenes for the given pose of the human, text-based editing of the human and placing multiple persons into a scene.
Sprite-Decompose can break down animated graphics into sprites using videos and box outlines.
Adobe’s Magic Fixup lets you edit images with a cut-and-paste approach that fixes edits automatically. Can see this being super useful for generating animation frames for tools like AnimateDiff. But it’s not clear yet if or when this hits Photoshop.
PartGLEE can locate and identify objects and their parts in images. The method uses a unified framework that enables detection, segmentation, and grounding at any granularity.
MIGC++ is a plug-and-play controller that enables Stable Diffusion with precise position control while ensuring the correctness of various attributes like color, shape, material, texture, and style. It can also control the number of instances and improve interaction between instances.
MaGGIe can efficiently predict high-quality human instance mattes from coarse binary masks for both image and video input. The method is able to output all instance mattes simultaneously without exploding memory and latency, making it suitable for real-time applications.
IntrinsicAnything is able to recover object materials from any images and enable single-view image relighting.
pix2gestalt is able to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
Break-A-Scene can extract multiple concepts from a single image using segmentation masks. It allows users to re-synthesize individual concepts or combinations in different contexts, enhancing scene generation with a two-phase customization process.
PAIR Diffusion is a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in an image. This allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
MultiDiffusion can generate high-quality images using a pre-trained text-to-image diffusion model. It allows users to control aspects like image size and includes features for guiding images with segmentation masks and bounding boxes.
Neural Congealing can align similar content across multiple images using a self-supervised method. It uses pre-trained DINO-ViT features to create a shared semantic map, allowing for effective alignment even with different appearances and backgrounds.
[Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries] can retrieve high-quality sound effects from a single video frame without needing text metadata. It uses a combination of large language models and contrastive learning to match sound effects to video better than existing methods.