Related papers: Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits

Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits

URL: http://arxiv.org/abs/2306.16879v2
Date: Tue, 31 Oct 2023 15:16:09 GMT
Title: Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits
Authors: Georgii Kostiuchik, Lalith Sharan, Benedikt Mayer, Ivo Wolf, Bernhard Preim, Sandy Engelhardt
Abstract summary: This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits. It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Results: We performed an analysis of common Cholec80 dataset splits and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets.
Score: 2.045596350476764
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Purpose: Machine learning models can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes of interest. Surgical workflow and instrument recognition tasks are complicated in this manner, because of heavy data imbalances resulting from different lengths of phases and their erratic occurrences. Furthermore, the issue becomes difficult as sub-properties that help define phases, like instrument (co-)occurrence, are usually not considered when defining the split. We argue that such sub-properties must be equally considered. Methods: This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits for surgical phase and instrument recognition. It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates the assessment and identification of sub-optimal dataset splits. Results: We performed an analysis of common Cholec80 dataset splits using the proposed application and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets. Additionally, we outlined possible improvements to the splits. A user study with ten participants demonstrated the ability of participants to solve a selection of data exploration tasks using the proposed application. Conclusion: In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split. Our interactive data visualization tool presents a promising approach for the assessment of dataset splits for surgical phase and instrument recognition. Evaluation results show that it can enhance the development of machine learning models. The application is available at https://cardio-ai.github.io/endovis-ml/ .

Related papers

Matched Machine Learning: A Generalized Framework for Treatment Effect Inference With Learned Metrics [87.05961347040237]
We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching. Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes. We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems.
arXiv Detail & Related papers (2023-04-03T19:32:30Z)
A classification performance evaluation measure considering data separability [6.751026374812737]
We propose a new separability measure--the rate of separability (RS)--based on the data coding rate. We demonstrate the positive correlation between the proposed measure and recognition accuracy in a multi-task scenario constructed from a real dataset.
arXiv Detail & Related papers (2022-11-10T09:18:26Z)
TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation. TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions. The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z)
Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance. The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
arXiv Detail & Related papers (2022-01-17T23:05:14Z)
Learning from Partially Overlapping Labels: Image Segmentation under Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation. We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z)
Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives. Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
Data Separability for Neural Network Classifiers and the Development of a Separability Index [17.49709034278995]
We created the Distance-based Separability Index (DSI) to measure the separability of datasets. We show that the DSI can indicate whether data belonging to different classes have similar distributions. We also discussed possible applications of the DSI in the fields of data science, machine learning, and deep learning.
arXiv Detail & Related papers (2020-05-27T01:49:19Z)
Semi-supervised lung nodule retrieval [2.055949720959582]
A content based image retrieval (CBIR) system provides as its output a set of images, ranked by similarity to the query image. Ground truth on similarity between dataset elements (e.g. between nodules) is not readily available, thus greatly challenging machine learning methods. The current study suggests a semi-supervised approach that involves two steps: 1) Automatic annotation of a given partially labeled dataset; 2) Learning a semantic similarity metric space based on the predicated annotations. The proposed system is demonstrated in lung nodule retrieval using the LIDC dataset, and shows that it is feasible to learn embedding from predicted ratings.
arXiv Detail & Related papers (2020-05-04T19:26:14Z)
To Split or Not to Split: The Impact of Disparate Treatment in Classification [8.325775867295814]
Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute. We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. We prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs.
arXiv Detail & Related papers (2020-02-12T04:05:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.