Self-supervised Representation Learning for Ultrasound Video
- URL: http://arxiv.org/abs/2003.00105v1
- Date: Fri, 28 Feb 2020 23:00:26 GMT
- Title: Self-supervised Representation Learning for Ultrasound Video
- Authors: Jianbo Jiao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J.
Alison Noble
- Abstract summary: We propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video.
We force the model to address anatomy-aware tasks with free supervision from the data itself.
Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations.
- Score: 18.515314344284445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning have achieved promising performance for
medical image analysis, while in most cases ground-truth annotations from human
experts are necessary to train the deep model. In practice, such annotations
are expensive to collect and can be scarce for medical imaging applications.
Therefore, there is significant interest in learning representations from
unlabelled raw data. In this paper, we propose a self-supervised learning
approach to learn meaningful and transferable representations from medical
imaging video without any type of human annotation. We assume that in order to
learn such a representation, the model should identify anatomical structures
from the unlabelled data. Therefore we force the model to address anatomy-aware
tasks with free supervision from the data itself. Specifically, the model is
designed to correct the order of a reshuffled video clip and at the same time
predict the geometric transformation applied to the video clip. Experiments on
fetal ultrasound video show that the proposed approach can effectively learn
meaningful and strong representations, which transfer well to downstream tasks
like standard plane detection and saliency prediction.
Related papers
- Rapid Training Data Creation by Synthesizing Medical Images for
Classification and Localization [10.506584969668792]
We present a method for the transformation of real data to train any Deep Neural Network to solve the above problems.
For the weakly supervised model, we show that the localization accuracy increases significantly using the generated data.
In the latter model, we show that the accuracy, when trained with generated images, closely parallels the accuracy when trained with exhaustively annotated real images.
arXiv Detail & Related papers (2023-08-09T03:49:12Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Segmentation of kidney stones in endoscopic video feeds [2.572404739180802]
We describe how we built a dataset from the raw videos and how we developed a pipeline to automate as much of the process as possible.
To show clinical potential for real-time use, we also confirmed that our best trained model can accurately annotate new videos at 30 frames per second.
arXiv Detail & Related papers (2022-04-29T16:00:52Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Self-supervised Contrastive Video-Speech Representation Learning for
Ultrasound [15.517484333872277]
In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access.
We propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data.
arXiv Detail & Related papers (2020-08-14T23:58:23Z) - Self-Supervised Representation Learning for Detection of ACL Tear Injury
in Knee MR Videos [18.54362818156725]
We propose a self-supervised learning approach to learn transferable features from MR video clips by enforcing the model to learn anatomical features.
To the best of our knowledge, none of the supervised learning models performing injury classification task from MR video provide any explanation for the decisions made by the models.
arXiv Detail & Related papers (2020-07-15T15:35:47Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z) - Self-Supervised Human Depth Estimation from Monocular Videos [99.39414134919117]
Previous methods on estimating detailed human depth often require supervised training with ground truth' depth data.
This paper presents a self-supervised method that can be trained on YouTube videos without known depth.
Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.
arXiv Detail & Related papers (2020-05-07T09:45:11Z) - Confident Coreset for Active Learning in Medical Image Analysis [57.436224561482966]
We propose a novel active learning method, confident coreset, which considers both uncertainty and distribution for effectively selecting informative samples.
By comparative experiments on two medical image analysis tasks, we show that our method outperforms other active learning methods.
arXiv Detail & Related papers (2020-04-05T13:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.