Semantic Video Segmentation for Intracytoplasmic Sperm Injection
Procedures
- URL: http://arxiv.org/abs/2101.01207v1
- Date: Mon, 4 Jan 2021 19:33:12 GMT
- Title: Semantic Video Segmentation for Intracytoplasmic Sperm Injection
Procedures
- Authors: Peter He, Raksha Jain, J\'er\^ome Chambost, C\'eline Jacques, Cristina
Hickman
- Abstract summary: We present the first deep learning model for the analysis of intracytoplasmic sperm injection (ICSI) procedures.
We train a deep neural network to segment key objects in the videos achieving a mean IoU of 0.962, and to localize the needle tip achieving a mean pixel error of 3.793 pixels at 14 FPS on a single GPU.
- Score: 7.813460653362095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first deep learning model for the analysis of intracytoplasmic
sperm injection (ICSI) procedures. Using a dataset of ICSI procedure videos, we
train a deep neural network to segment key objects in the videos achieving a
mean IoU of 0.962, and to localize the needle tip achieving a mean pixel error
of 3.793 pixels at 14 FPS on a single GPU. We further analyze the variation
between the dataset's human annotators and find the model's performance to be
comparable to human experts.
Related papers
- CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips [1.339950379203994]
We propose a method for joint tracking of all structures simultaneously on a single 2D monocular video clip.
Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy.
We evaluate tracking on video clips laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments.
arXiv Detail & Related papers (2024-03-28T09:44:20Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep
Imaging Ultrasound [1.2903292694072621]
Volume sweep imaging (VSI) is an innovative approach that enables untrained operators to capture quality ultrasound images.
We present a novel segmentation model known as Wavelet_Attention_UNet (WATUNet)
In this model, we incorporate wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder instead of a simple connection to overcome the limitations mentioned.
arXiv Detail & Related papers (2023-11-17T20:32:37Z) - Comparative analysis of deep learning approaches for AgNOR-stained
cytology samples interpretation [52.77024349608834]
This paper provides a way to analyze argyrophilic nucleolar organizer regions (AgNOR) stained slide using deep learning approaches.
Our results show that the semantic segmentation using U-Net with ResNet-18 or ResNet-34 as the backbone have similar results.
The best model shows an IoU for nucleus, cluster, and satellites of 0.83, 0.92, and 0.99 respectively.
arXiv Detail & Related papers (2022-10-19T15:15:32Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Segmentation of kidney stones in endoscopic video feeds [2.572404739180802]
We describe how we built a dataset from the raw videos and how we developed a pipeline to automate as much of the process as possible.
To show clinical potential for real-time use, we also confirmed that our best trained model can accurately annotate new videos at 30 frames per second.
arXiv Detail & Related papers (2022-04-29T16:00:52Z) - Evaluation of Deep Learning Topcoders Method for Neuron
Individualization in Histological Macaque Brain Section [0.0]
We propose an ensemble Deep Learning algorithm to perform cell individualization on neurological data.
Results suggest that the proposed method successfully segments neuronal cells in both object-level and pixel-level, with an average detection accuracy of 0.93.
arXiv Detail & Related papers (2021-11-10T16:38:35Z) - Predicting Semen Motility using three-dimensional Convolutional Neural
Networks [0.0]
We propose an improved deep learning based approach using three-dimensional convolutional neural networks to predict sperm motility from microscopic videos of the semen sample.
Our models indicate that deep learning based automatic semen analysis may become a valuable and effective tool in fertility and IVF labs.
arXiv Detail & Related papers (2021-01-08T07:38:52Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z) - Appearance Learning for Image-based Motion Estimation in Tomography [60.980769164955454]
In tomographic imaging, anatomical structures are reconstructed by applying a pseudo-inverse forward model to acquired signals.
Patient motion corrupts the geometry alignment in the reconstruction process resulting in motion artifacts.
We propose an appearance learning approach recognizing the structures of rigid motion independently from the scanned object.
arXiv Detail & Related papers (2020-06-18T09:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.