Zero-Shot Anomaly Detection with Pre-trained Segmentation Models
- URL: http://arxiv.org/abs/2306.09269v1
- Date: Thu, 15 Jun 2023 16:43:07 GMT
- Title: Zero-Shot Anomaly Detection with Pre-trained Segmentation Models
- Authors: Matthew Baugh, James Batten, Johanna P. M\"uller, Bernhard Kainz
- Abstract summary: This report outlines our submission to the zero-shot track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge.
Building on the performance of the WINCLIP framework, we aim to enhance the system's localization capabilities by integrating zero-shot segmentation models.
Our pipeline requires no external data or information, allowing for it to be directly applied to new datasets.
- Score: 2.9322869014189985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report outlines our submission to the zero-shot track of the
Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. Building on the
performance of the WINCLIP framework, we aim to enhance the system's
localization capabilities by integrating zero-shot segmentation models. In
addition, we perform foreground instance segmentation which enables the model
to focus on the relevant parts of the image, thus allowing the models to better
identify small or subtle deviations. Our pipeline requires no external data or
information, allowing for it to be directly applied to new datasets. Our team
(Variance Vigilance Vanguard) ranked third in the zero-shot track of the VAND
challenge, and achieve an average F1-max score of 81.5/24.2 at a sample/pixel
level on the VisA dataset.
Related papers
- Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Small, Versatile and Mighty: A Range-View Perception Framework [13.85089181673372]
We propose a novel multi-task framework for 3D detection of LiDAR data.
Our framework integrates semantic segmentation and panoptic segmentation tasks for the LiDAR point cloud.
Among range-view-based methods, our model achieves new state-of-the-art detection performances on the Open dataset.
arXiv Detail & Related papers (2024-03-01T07:02:42Z) - ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs [36.749959232724514]
ZeroG is a new framework tailored to enable cross-dataset generalization.
We address the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer.
We propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs.
arXiv Detail & Related papers (2024-02-17T09:52:43Z) - Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models [4.157013247909771]
We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer)
We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments.
We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset.
arXiv Detail & Related papers (2023-11-17T21:58:26Z) - Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Lidar Panoptic Segmentation and Tracking without Bells and Whistles [48.078270195629415]
We propose a detection-centric network for lidar segmentation and tracking.
One of the core components of our network is the object instance detection branch.
We evaluate our method on several 3D/4D LPS benchmarks and observe that our model establishes a new state-of-the-art among open-sourced models.
arXiv Detail & Related papers (2023-10-19T04:44:43Z) - Zero-Shot Refinement of Buildings' Segmentation Models using SAM [6.110856077714895]
We present a novel approach to adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM)
SAM does not offer recognition abilities and thus fails to classify and tag localized objects.
This novel approach augments SAM with recognition abilities, a first of its kind.
arXiv Detail & Related papers (2023-10-03T07:19:59Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.