Reliability in Semantic Segmentation: Are We on the Right Track?
- URL: http://arxiv.org/abs/2303.11298v1
- Date: Mon, 20 Mar 2023 17:38:24 GMT
- Title: Reliability in Semantic Segmentation: Are We on the Right Track?
- Authors: Pau de Jorge, Riccardo Volpi, Philip Torr, Gregory Rogez
- Abstract summary: We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers.
We find that while recent models are significantly more robust, they are not overall more reliable in terms of uncertainty estimation.
This is the first study on modern segmentation models focused on both robustness and uncertainty estimation.
- Score: 15.0189654919665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the increasing popularity of transformers in computer vision, in
recent times there has been a rapid development of novel architectures. While
in-domain performance follows a constant, upward trend, properties like
robustness or uncertainty estimation are less explored -leaving doubts about
advances in model reliability. Studies along these axes exist, but they are
mainly limited to classification models. In contrast, we carry out a study on
semantic segmentation, a relevant task for many real-world applications where
model reliability is paramount. We analyze a broad variety of models, spanning
from older ResNet-based architectures to novel transformers and assess their
reliability based on four metrics: robustness, calibration, misclassification
detection and out-of-distribution (OOD) detection. We find that while recent
models are significantly more robust, they are not overall more reliable in
terms of uncertainty estimation. We further explore methods that can come to
the rescue and show that improving calibration can also help with other
uncertainty metrics such as misclassification or OOD detection. This is the
first study on modern segmentation models focused on both robustness and
uncertainty estimation and we hope it will help practitioners and researchers
interested in this fundamental vision task. Code available at
https://github.com/naver/relis.
Related papers
- The BRAVO Semantic Segmentation Challenge Results in UNCV2024 [68.20197719071436]
We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training.
The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.
arXiv Detail & Related papers (2024-09-23T15:17:30Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Monitoring Machine Learning Models: Online Detection of Relevant
Deviations [0.0]
Machine learning models can degrade over time due to changes in data distribution or other factors.
We propose a sequential monitoring scheme to detect relevant changes.
Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations.
arXiv Detail & Related papers (2023-09-26T18:46:37Z) - How Reliable is Your Regression Model's Uncertainty Under Real-World
Distribution Shifts? [46.05502630457458]
We propose a benchmark of 8 image-based regression datasets with different types of challenging distribution shifts.
We find that while methods are well calibrated when there is no distribution shift, they all become highly overconfident on many of the benchmark datasets.
arXiv Detail & Related papers (2023-02-07T18:54:39Z) - CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal
Relationships [8.679073301435265]
We construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data.
We use these labels to perturb the data by deleting non-causal agents from the scene.
Under non-causal perturbations, we observe a $25$-$38%$ relative change in minADE as compared to the original.
arXiv Detail & Related papers (2022-07-07T21:28:23Z) - Large-scale Robustness Analysis of Video Action Recognition Models [10.017292176162302]
We study robustness of six state-of-the-art action recognition models against 90 different perturbations.
The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than CNN based models, and 3) All of the studied models are robust to temporal perturbations for all datasets but SSv2.
arXiv Detail & Related papers (2022-07-04T13:29:34Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - Estimating the Robustness of Classification Models by the Structure of
the Learned Feature-Space [10.418647759223964]
We argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions.
To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space.
arXiv Detail & Related papers (2021-06-23T10:52:29Z) - Approaching Neural Network Uncertainty Realism [53.308409014122816]
Quantifying or at least upper-bounding uncertainties is vital for safety-critical systems such as autonomous vehicles.
We evaluate uncertainty realism -- a strict quality criterion -- with a Mahalanobis distance-based statistical test.
We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.
arXiv Detail & Related papers (2021-01-08T11:56:12Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.