Metrics reloaded: Recommendations for image analysis validation
- URL: http://arxiv.org/abs/2206.01653v8
- Date: Fri, 23 Feb 2024 13:05:20 GMT
- Title: Metrics reloaded: Recommendations for image analysis validation
- Authors: Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Tizabi, Florian
Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens
Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel
Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias
Eisenmann, Doreen Heckmann-N\"otzel, Tim R\"adsch, Laura Acion, Michela
Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew Blaschko, M.
Jorge Cardoso, Veronika Cheplygina, Beth A. Cimini, Gary S. Collins, Keyvan
Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Robert Haase,
Daniel A. Hashimoto, Michael M. Hoffman, Merel Huisman, Pierre Jannin,
Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris,
Alan Karthikesalingam, Hannes Kenngott, Florian Kofler, Annette
Kopp-Schneider, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens,
Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering,
Bjoern Menze, Karel G.M. Moons, Henning M\"uller, Brennan Nichyporuk, Felix
Nickel, Jens Petersen, Nasir Rajpoot, Nicola Rieke, Julio Saez-Rodriguez,
Clara I. S\'anchez, Shravya Shetty, Maarten van Smeden, Ronald M. Summers,
Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben Van Calster,
Ga\"el Varoquaux, Paul F. J\"ager
- Abstract summary: Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
- Score: 59.60445111432934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Increasing evidence shows that flaws in machine learning (ML) algorithm
validation are an underestimated global problem. Particularly in automatic
biomedical image analysis, chosen performance metrics often do not reflect the
domain interest, thus failing to adequately measure scientific progress and
hindering translation of ML techniques into practice. To overcome this, our
large international expert consortium created Metrics Reloaded, a comprehensive
framework guiding researchers in the problem-aware selection of metrics.
Following the convergence of ML methodology across application domains, Metrics
Reloaded fosters the convergence of validation methodology. The framework was
developed in a multi-stage Delphi process and is based on the novel concept of
a problem fingerprint - a structured representation of the given problem that
captures all aspects that are relevant for metric selection, from the domain
interest to the properties of the target structure(s), data set and algorithm
output. Based on the problem fingerprint, users are guided through the process
of choosing and applying appropriate validation metrics while being made aware
of potential pitfalls. Metrics Reloaded targets image analysis problems that
can be interpreted as a classification task at image, object or pixel level,
namely image-level classification, object detection, semantic segmentation, and
instance segmentation tasks. To improve the user experience, we implemented the
framework in the Metrics Reloaded online tool, which also provides a point of
access to explore weaknesses, strengths and specific recommendations for the
most common validation metrics. The broad applicability of our framework across
domains is demonstrated by an instantiation for various biological and medical
image analysis use cases.
Related papers
- A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation [1.124958340749622]
We introduce a multitask learning framework to leverage correlations among the counting, detection, and segmentation tasks.
We develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection.
The proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart.
arXiv Detail & Related papers (2024-03-31T12:22:23Z) - Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis.
Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z) - SGDR: Semantic-guided Disentangled Representation for Unsupervised
Cross-modality Medical Image Segmentation [5.090366802287405]
We propose a novel framework, called semantic-guided disentangled representation (SGDR), to exact semantically meaningful feature for segmentation task.
We validated our method on two public datasets and experiment results show that our approach outperforms the state of the art methods on two evaluation metrics by a significant margin.
arXiv Detail & Related papers (2022-03-26T08:31:00Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - How can we learn (more) from challenges? A statistical approach to
driving future algorithm development [1.0690055408831725]
We present a statistical framework for learning from challenges and instantiate it for the specific task of instrument instance segmentation in laparoscopic videos.
Based on 51,542 meta data performed on 2,728 images, we applied our approach to the results of the Robust Medical Instrument Challenge (ROBUST-MIS) challenge 2019.
Our method development, tailored to the specific remaining issues, yielded a deep learning model with state-of-the-art overall performance and specific strengths in the processing of images in which previous methods tended to fail.
arXiv Detail & Related papers (2021-06-17T08:12:37Z) - Common Limitations of Image Processing Metrics: A Picture Story [58.83274952067888]
This document focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task.
The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.
arXiv Detail & Related papers (2021-04-12T17:03:42Z) - Panoptic Feature Fusion Net: A Novel Instance Segmentation Paradigm for
Biomedical and Biological Images [91.41909587856104]
We present a Panoptic Feature Fusion Net (PFFNet) that unifies the semantic and instance features in this work.
Our proposed PFFNet contains a residual attention feature fusion mechanism to incorporate the instance prediction with the semantic features.
It outperforms several state-of-the-art methods on various biomedical and biological datasets.
arXiv Detail & Related papers (2020-02-15T09:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.