Related papers: Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

URL: http://arxiv.org/abs/2310.19252v1
Date: Mon, 30 Oct 2023 03:45:15 GMT
Title: Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union
Authors: Zifu Wang and Maxim Berman and Amal Rannen-Triki and Philip H.S. Torr and Devis Tuia and Tinne Tuytelaars and Luc Van Gool and Jiaqian Yu and Matthew B. Blaschko
Abstract summary: We propose the use of fine-grained mIoUs along with corresponding worst-case metrics. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects.
Score: 113.20223082664681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

Related papers

A Distance Metric for Mixed Integer Programming Instances [0.0]
Mixed-integer linear programming (MILP) is a powerful tool for addressing a wide range of real-world problems.<n>Existing similarity metrics often lack precision in identifying instance classes or rely heavily on labeled data.<n>This paper introduces the first mathematical distance metric for MILP instances, derived directly from their mathematical formulations.
arXiv Detail & Related papers (2025-07-15T07:55:09Z)
Approximate Size Targets Are Sufficient for Accurate Semantic Segmentation [52.239136918460616]
Extending binary class tags to approximate relative object-size distributions allows off-the-shelf architectures to solve the segmentation problem. A straightforward zero-avoiding KL-divergence loss for average predictions produces segmentation accuracy comparable to the standard pixel-precise supervision. Our ideas are validated on PASCAL VOC using our new human annotations of approximate object sizes.
arXiv Detail & Related papers (2025-03-10T06:02:13Z)
Fine-grained Metrics for Point Cloud Semantic Segmentation [6.713120348917712]
Two forms of imbalances are commonly observed in point cloud semantic segmentation datasets. The majority of categories and large objects are favored in the existing evaluation metrics. This paper suggests fine-grained mIoU and mAcc for a more thorough assessment of point cloud segmentation algorithms.
arXiv Detail & Related papers (2024-07-31T02:25:30Z)
Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection [133.66006666465447]
Current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. We develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes.
arXiv Detail & Related papers (2024-05-16T03:01:06Z)
$F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$. It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z)
Piecewise-Linear Manifolds for Deep Metric Learning [8.670873561640903]
Unsupervised deep metric learning focuses on learning a semantic representation space using only unlabeled data. We propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear piece approximating the data manifold in a small neighborhood of a point. We empirically show that this similarity estimate correlates better with the ground truth than the similarity estimates of current state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-22T06:22:20Z)
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation [55.92852268168816]
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. Recent studies have revealed a weak correlation between these matching-based metrics and human evaluations. We propose to utilize textitmultiple references to enhance the consistency between these metrics and human evaluations.
arXiv Detail & Related papers (2023-08-06T14:49:26Z)
Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics) Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z)
Long-tail Detection with Effective Class-Margins [4.18804572788063]
We show how the commonly used mean average precision evaluation metric on an unknown test set is bound by a margin-based binary classification error. We optimize margin-based binary classification error with a novel surrogate objective called text-Effective Class-Margin Loss (ECM)
arXiv Detail & Related papers (2023-01-23T21:25:24Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.