Related papers: MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

URL: http://arxiv.org/abs/2505.09265v1
Date: Wed, 14 May 2025 10:25:26 GMT
Title: MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning
Authors: Bin-Bin Gao,
Abstract summary: We present a novel paradigm that unifies anomaly segmentation into change segmentation.<n>We propose a one-prompt Meta-learning framework for Universal Anomaly (MetaUAS)<n>Our method effectively and efficiently segments any anomalies with only one normal image prompt.
Score: 4.887838886202545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero- and few-shot visual anomaly segmentation relies on powerful vision-language models that detect unseen anomalies using manually designed textual prompts. However, visual representations are inherently independent of language. In this paper, we explore the potential of a pure visual foundation model as an alternative to widely used vision-language models for universal visual anomaly segmentation. We present a novel paradigm that unifies anomaly segmentation into change segmentation. This paradigm enables us to leverage large-scale synthetic image pairs, featuring object-level and local region changes, derived from existing image datasets, which are independent of target anomaly datasets. We propose a one-prompt Meta-learning framework for Universal Anomaly Segmentation (MetaUAS) that is trained on this synthetic dataset and then generalizes well to segment any novel or unseen visual anomalies in the real world. To handle geometrical variations between prompt and query images, we propose a soft feature alignment module that bridges paired-image change perception and single-image semantic segmentation. This is the first work to achieve universal anomaly segmentation using a pure vision model without relying on special anomaly detection datasets and pre-trained visual-language models. Our method effectively and efficiently segments any anomalies with only one normal image prompt and enjoys training-free without guidance from language. Our MetaUAS significantly outperforms previous zero-shot, few-shot, and even full-shot anomaly segmentation methods. The code and pre-trained models are available at https://github.com/gaobb/MetaUAS.

Related papers

Towards Universal Text-driven CT Image Segmentation [4.76971404389011]
We propose OpenVocabCT, a vision-language model pretrained on large-scale 3D CT images for universal text-driven segmentation.<n>We decompose the diagnostic reports into fine-grained, organ-level descriptions using large language models for multi-granular contrastive learning.
arXiv Detail & Related papers (2025-03-08T03:02:57Z)
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z)
Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
In-context segmentation aims to segment objects using given reference images.<n>Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries.<n>This work approaches the problem from a fresh perspective - unlocking the capability of the latent diffusion model for in-context segmentation.
arXiv Detail & Related papers (2024-03-14T17:52:31Z)
MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation [17.59676962334776]
Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively. We present a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart.
arXiv Detail & Related papers (2024-01-22T07:31:52Z)
Aligning and Prompting Everything All at Once for Universal Visual Perception [79.96124061108728]
APE is a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks. APE advances the convergence of detection and grounding by reformulating language-guided grounding as open-vocabulary detection. Experiments on over 160 datasets demonstrate that APE outperforms state-of-the-art models.
arXiv Detail & Related papers (2023-12-04T18:59:50Z)
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers [51.260510447308306]
We show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning. We propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path. We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation.
arXiv Detail & Related papers (2023-12-01T19:06:12Z)
Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models. ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image. Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z)
Segment Any Anomaly without Training via Hybrid Prompt Regularization [15.38935129648466]
We present a novel framework, i.e., Segment Any Anomaly + (SAA+), for zero-shot anomaly segmentation with hybrid prompt regularization. Our proposed SAA+ model achieves state-of-the-art performance on several anomaly segmentation benchmarks, including VisA, MVTec-AD, MTD, and KSDD2.
arXiv Detail & Related papers (2023-05-18T05:52:06Z)
Improving Data-Efficient Fossil Segmentation via Model Editing [4.683612295430956]
We present a two-part paradigm to improve fossil segmentation with few labeled images. We apply domain-informed image perturbations to expose the Mask R-CNN's inability to distinguish between different classes of fossils. We extend an existing model-editing method for correcting systematic mistakes in image classification to image segmentation with no additional labeled data needed.
arXiv Detail & Related papers (2022-10-08T02:12:38Z)
Instance Segmentation of Unlabeled Modalities via Cyclic Segmentation GAN [27.936725483892076]
We propose a novel Cyclic Generative Adrial Network (CySGAN) that conducts image translation and instance segmentation jointly. We benchmark our approach on the task of 3D neuronal nuclei segmentation with annotated electron microscopy (EM) images and unlabeled expansion microscopy (ExM) data.
arXiv Detail & Related papers (2022-04-06T20:46:39Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.