VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding
based Deep Learning
- URL: http://arxiv.org/abs/2106.11480v1
- Date: Tue, 22 Jun 2021 02:03:26 GMT
- Title: VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding
based Deep Learning
- Authors: Mengyang Zhao, Quan Liu, Aadarsh Jha, Ruining Deng, Tianyuan Yao,
Anita Mahadevan-Jansen, Matthew J.Tyska, Bryan A. Millis, Yuankai Huo
- Abstract summary: We propose a novel spatial-temporal voxel-embedding (VoxelEmbed) based learning method to perform simultaneous cell instance segmenting and tracking on 3D volumetric video sequences.
We evaluate our VoxelEmbed method on four 3D datasets (with different cell types) from the I SBI Cell Tracking Challenge.
- Score: 5.434831972326107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in bioimaging have provided scientists a superior high
spatial-temporal resolution to observe dynamics of living cells as 3D
volumetric videos. Unfortunately, the 3D biomedical video analysis is lagging,
impeded by resource insensitive human curation using off-the-shelf 3D analytic
tools. Herein, biologists often need to discard a considerable amount of rich
3D spatial information by compromising on 2D analysis via maximum intensity
projection. Recently, pixel embedding-based cell instance segmentation and
tracking provided a neat and generalizable computing paradigm for understanding
cellular dynamics. In this work, we propose a novel spatial-temporal
voxel-embedding (VoxelEmbed) based learning method to perform simultaneous cell
instance segmenting and tracking on 3D volumetric video sequences. Our
contribution is in four-fold: (1) The proposed voxel embedding generalizes the
pixel embedding with 3D context information; (2) Present a simple multi-stream
learning approach that allows effective spatial-temporal embedding; (3)
Accomplished an end-to-end framework for one-stage 3D cell instance
segmentation and tracking without heavy parameter tuning; (4) The proposed 3D
quantification is memory efficient via a single GPU with 12 GB memory. We
evaluate our VoxelEmbed method on four 3D datasets (with different cell types)
from the ISBI Cell Tracking Challenge. The proposed VoxelEmbed method achieved
consistent superior overall performance (OP) on two densely annotated datasets.
The performance is also competitive on two sparsely annotated cohorts with
20.6% and 2% of data-set having segmentation annotations. The results
demonstrate that the VoxelEmbed method is a generalizable and memory-efficient
solution.
Related papers
- Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views [10.944692719150071]
We propose a novel 3D brain segmentation approach using complementary 2D diffusion models.
Our goal is to achieve reliable segmentation quality without requiring complete labels for each individual subject.
arXiv Detail & Related papers (2024-07-17T06:14:53Z) - Volumetric Environment Representation for Vision-Language Navigation [66.04379819772764]
Vision-language navigation (VLN) requires an agent to navigate through a 3D environment based on visual observations and natural language instructions.
We introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.
VER predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly.
arXiv Detail & Related papers (2024-03-21T06:14:46Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps [1.015920567871904]
We describe a method for large-scale 3D cell-tracking through a segmentation selection approach.
We show that this method achieves state-of-the-art results in 3D images from the cell tracking challenge.
Our framework is flexible and supports segmentations from off-the-shelf cell segmentation models.
arXiv Detail & Related papers (2023-08-08T18:41:38Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - YOLO2U-Net: Detection-Guided 3D Instance Segmentation for Microscopy [0.0]
We introduce a comprehensive method for accurate 3D instance segmentation of cells in the brain tissue.
The proposed method combines the 2D YOLO detection method with a multi-view fusion algorithm to construct a 3D localization of the cells.
The promising performance of the proposed method is shown in comparison with some current deep learning-based 3D instance segmentation methods.
arXiv Detail & Related papers (2022-07-13T14:17:52Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.