Surgical Phase and Instrument Recognition: How to identify appropriate
Dataset Splits
- URL: http://arxiv.org/abs/2306.16879v2
- Date: Tue, 31 Oct 2023 15:16:09 GMT
- Title: Surgical Phase and Instrument Recognition: How to identify appropriate
Dataset Splits
- Authors: Georgii Kostiuchik, Lalith Sharan, Benedikt Mayer, Ivo Wolf, Bernhard
Preim, Sandy Engelhardt
- Abstract summary: This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits.
It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets.
Results: We performed an analysis of common Cholec80 dataset splits and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets.
- Score: 2.045596350476764
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Purpose: Machine learning models can only be reliably evaluated if training,
validation, and test data splits are representative and not affected by the
absence of classes of interest. Surgical workflow and instrument recognition
tasks are complicated in this manner, because of heavy data imbalances
resulting from different lengths of phases and their erratic occurrences.
Furthermore, the issue becomes difficult as sub-properties that help define
phases, like instrument (co-)occurrence, are usually not considered when
defining the split. We argue that such sub-properties must be equally
considered.
Methods: This work presents a publicly available data visualization tool that
enables interactive exploration of dataset splits for surgical phase and
instrument recognition. It focuses on the visualization of the occurrence of
phases, phase transitions, instruments, and instrument combinations across
sets. Particularly, it facilitates the assessment and identification of
sub-optimal dataset splits.
Results: We performed an analysis of common Cholec80 dataset splits using the
proposed application and were able to uncover phase transitions and
combinations of instruments that were not represented in one of the sets.
Additionally, we outlined possible improvements to the splits. A user study
with ten participants demonstrated the ability of participants to solve a
selection of data exploration tasks using the proposed application.
Conclusion: In highly unbalanced class distributions, special care should be
taken with respect to the selection of an appropriate dataset split. Our
interactive data visualization tool presents a promising approach for the
assessment of dataset splits for surgical phase and instrument recognition.
Evaluation results show that it can enhance the development of machine learning
models. The application is available at https://cardio-ai.github.io/endovis-ml/ .
Related papers
- Matched Machine Learning: A Generalized Framework for Treatment Effect
Inference With Learned Metrics [87.05961347040237]
We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching.
Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes.
We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems.
arXiv Detail & Related papers (2023-04-03T19:32:30Z) - A classification performance evaluation measure considering data
separability [6.751026374812737]
We propose a new separability measure--the rate of separability (RS)--based on the data coding rate.
We demonstrate the positive correlation between the proposed measure and recognition accuracy in a multi-task scenario constructed from a real dataset.
arXiv Detail & Related papers (2022-11-10T09:18:26Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
This paper explores how changes in a data set influence the measured performance of a model.
Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance.
The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
arXiv Detail & Related papers (2022-01-17T23:05:14Z) - Learning from Partially Overlapping Labels: Image Segmentation under
Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation.
We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Data Separability for Neural Network Classifiers and the Development of
a Separability Index [17.49709034278995]
We created the Distance-based Separability Index (DSI) to measure the separability of datasets.
We show that the DSI can indicate whether data belonging to different classes have similar distributions.
We also discussed possible applications of the DSI in the fields of data science, machine learning, and deep learning.
arXiv Detail & Related papers (2020-05-27T01:49:19Z) - Semi-supervised lung nodule retrieval [2.055949720959582]
A content based image retrieval (CBIR) system provides as its output a set of images, ranked by similarity to the query image.
Ground truth on similarity between dataset elements (e.g. between nodules) is not readily available, thus greatly challenging machine learning methods.
The current study suggests a semi-supervised approach that involves two steps: 1) Automatic annotation of a given partially labeled dataset; 2) Learning a semantic similarity metric space based on the predicated annotations.
The proposed system is demonstrated in lung nodule retrieval using the LIDC dataset, and shows that it is feasible to learn embedding from predicted ratings.
arXiv Detail & Related papers (2020-05-04T19:26:14Z) - To Split or Not to Split: The Impact of Disparate Treatment in
Classification [8.325775867295814]
Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute.
We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers.
We prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs.
arXiv Detail & Related papers (2020-02-12T04:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.