Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery
Videos
- URL: http://arxiv.org/abs/2401.02791v2
- Date: Mon, 8 Jan 2024 19:50:45 GMT
- Title: Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery
Videos
- Authors: Ryo Fujii and Ryo Hachiuma and Hideo Saito
- Abstract summary: Surgical tool detection is essential for analyzing and evaluating minimally invasive surgery videos.
Large image datasets with instance-level labels are often limited because of the burden of annotation.
In this work, we propose to strike a balance between the extremely costly annotation burden and detection performance.
- Score: 11.61305113932032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Surgical tool detection is essential for analyzing and evaluating minimally
invasive surgery videos. Current approaches are mostly based on supervised
methods that require large, fully instance-level labels (i.e., bounding boxes).
However, large image datasets with instance-level labels are often limited
because of the burden of annotation. Thus, surgical tool detection is important
when providing image-level labels instead of instance-level labels since
image-level annotations are considerably more time-efficient than
instance-level annotations. In this work, we propose to strike a balance
between the extremely costly annotation burden and detection performance. We
further propose a co-occurrence loss, which considers a characteristic that
some tool pairs often co-occur together in an image to leverage image-level
labels. Encapsulating the knowledge of co-occurrence using the co-occurrence
loss helps to overcome the difficulty in classification that originates from
the fact that some tools have similar shapes and textures. Extensive
experiments conducted on the Endovis2018 dataset in various data settings show
the effectiveness of our method.
Related papers
- Deep Bayesian Active Learning-to-Rank with Relative Annotation for Estimation of Ulcerative Colitis Severity [10.153691271169745]
We propose a deep Bayesian active learning-to-rank that automatically selects appropriate pairs for relative annotation.
Our method preferentially annotates unlabeled pairs with high learning efficiency from the model uncertainty of the samples.
arXiv Detail & Related papers (2024-09-08T02:19:40Z) - Domain Adaptive Multiple Instance Learning for Instance-level Prediction
of Pathological Images [45.132775668689604]
We propose a new task setting to improve the classification performance of the target dataset without increasing annotation costs.
In order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence.
arXiv Detail & Related papers (2023-04-07T08:31:06Z) - Semi-weakly Supervised Contrastive Representation Learning for Retinal
Fundus Images [0.2538209532048867]
We propose a semi-weakly supervised contrastive learning framework for representation learning using semi-weakly annotated images.
We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets.
arXiv Detail & Related papers (2021-08-04T15:50:09Z) - Learning from Partially Overlapping Labels: Image Segmentation under
Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation.
We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z) - Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images.
During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning.
A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - Large-Scale Unsupervised Person Re-Identification with Contrastive
Learning [17.04597303816259]
Most existing unsupervised and domain adaptation ReID methods utilize only the public datasets in their experiments.
Inspired by the recent progress of large-scale self-supervised image classification using contrastive learning, we propose to learn ReID representation from large-scale unlabeled surveillance video alone.
arXiv Detail & Related papers (2021-05-17T14:55:08Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.