Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection
in Semantic Segmentation
- URL: http://arxiv.org/abs/2211.14512v3
- Date: Mon, 21 Aug 2023 07:14:04 GMT
- Title: Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection
in Semantic Segmentation
- Authors: Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios
Belagiannis, Ian Reid and Gustavo Carneiro
- Abstract summary: This paper introduces a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels without affecting the inlier segmentation performance.
We also present a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels among various contexts.
Our approach improves by around 10% FPR and 7% AuPRC the previous state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly datasets.
- Score: 38.784135463275305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation models classify pixels into a set of known
(``in-distribution'') visual classes. When deployed in an open world, the
reliability of these models depends on their ability not only to classify
in-distribution pixels but also to detect out-of-distribution (OoD) pixels.
Historically, the poor OoD detection performance of these models has motivated
the design of methods based on model re-training using synthetic training
images that include OoD visual objects. Although successful, these re-trained
methods have two issues: 1) their in-distribution segmentation accuracy may
drop during re-training, and 2) their OoD detection accuracy does not
generalise well to new contexts (e.g., country surroundings) outside the
training set (e.g., city surroundings). In this paper, we mitigate these issues
with: (i) a new residual pattern learning (RPL) module that assists the
segmentation model to detect OoD pixels without affecting the inlier
segmentation performance; and (ii) a novel context-robust contrastive learning
(CoroCL) that enforces RPL to robustly detect OoD pixels among various
contexts. Our approach improves by around 10\% FPR and 7\% AuPRC the previous
state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly
datasets. Our code is available at: https://github.com/yyliu01/RPL.
Related papers
- Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks? [4.9260675787714]
Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under partial occlusion.
We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance (IRUO) dataset (arXiv:2102.01558)
We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images.
arXiv Detail & Related papers (2024-09-16T23:21:22Z) - Bayesian Active Learning for Semantic Segmentation [9.617769135242973]
We introduce a Bayesian active learning framework based on sparse pixel-level annotation.
BalEnt captures the information between the models' predicted marginalized probability distribution and the pixel labels.
We train our proposed active learning framework for Cityscapes, Camvid, ADE20K and VOC2012 benchmark datasets.
arXiv Detail & Related papers (2024-08-03T07:26:10Z) - LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations [4.680881326162484]
Contrastive instance discrimination methods outperform supervised learning in downstream tasks such as image classification and object detection.
A common augmentation technique in contrastive learning is random cropping followed by resizing.
We introduce LeOCLR, a framework that employs a novel instance discrimination approach and an adapted loss function.
arXiv Detail & Related papers (2024-03-11T15:33:32Z) - Placing Objects in Context via Inpainting for Out-of-distribution Segmentation [59.00092709848619]
Placing Objects in Context (POC) is a pipeline to realistically add objects to an image.
POC can be used to extend any dataset with an arbitrary number of objects.
We present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods.
arXiv Detail & Related papers (2024-02-26T08:32:41Z) - Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth
Information [7.561849435043042]
Self-supervised representation learning based on Contrastive Learning (CL) has been the subject of much attention in recent years.
In this paper we will focus on the depth information, which can be obtained by using a depth network or measured from available data.
We show that using this estimation information in the contrastive loss leads to improved results and that the learned representations better follow the shapes of objects.
arXiv Detail & Related papers (2022-11-18T11:45:39Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language
KnowledgeDistillation [5.424015823818208]
A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging.
We develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5.
During inference, our model can be adapted to detect any number of object classes without additional training.
arXiv Detail & Related papers (2021-09-24T16:46:36Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.