A Point in the Right Direction: Vector Prediction for Spatially-aware
Self-supervised Volumetric Representation Learning
- URL: http://arxiv.org/abs/2211.08533v1
- Date: Tue, 15 Nov 2022 22:10:50 GMT
- Title: A Point in the Right Direction: Vector Prediction for Spatially-aware
Self-supervised Volumetric Representation Learning
- Authors: Yejia Zhang, Pengfei Gu, Nishchal Sapkota, Hao Zheng, Peixian Liang,
Danny Z. Chen
- Abstract summary: VectorPOSE promotes better spatial understanding with two novel pretext tasks: Vector Prediction and Boundary-Focused Reconstruction.
We evaluate VectorPOSE on three 3D medical image segmentation tasks, showing that it often outperforms state-of-the-art methods.
- Score: 12.369884719068228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High annotation costs and limited labels for dense 3D medical imaging tasks
have recently motivated an assortment of 3D self-supervised pretraining methods
that improve transfer learning performance. However, these methods commonly
lack spatial awareness despite its centrality in enabling effective 3D image
analysis. More specifically, position, scale, and orientation are not only
informative but also automatically available when generating image crops for
training. Yet, to date, no work has proposed a pretext task that distills all
key spatial features. To fulfill this need, we develop a new self-supervised
method, VectorPOSE, which promotes better spatial understanding with two novel
pretext tasks: Vector Prediction (VP) and Boundary-Focused Reconstruction
(BFR). VP focuses on global spatial concepts (i.e., properties of 3D patches)
while BFR addresses weaknesses of recent reconstruction methods to learn more
effective local representations. We evaluate VectorPOSE on three 3D medical
image segmentation tasks, showing that it often outperforms state-of-the-art
methods, especially in limited annotation settings.
Related papers
- LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering [0.5852077003870417]
LangOcc is a novel approach for open vocabulary occupancy estimation.
It is trained only via camera images, and can detect arbitrary semantics via vision-language alignment.
We achieve state-of-the-art results in self-supervised semantic occupancy estimation on the Occ3D-nuScenes dataset.
arXiv Detail & Related papers (2024-07-24T14:22:55Z) - Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness [44.15562068190958]
In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings.
State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable.
We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras.
arXiv Detail & Related papers (2024-07-07T17:17:52Z) - OccFlowNet: Towards Self-supervised Occupancy Estimation via
Differentiable Rendering and Occupancy Flow [0.6577148087211809]
We present a novel approach to occupancy estimation inspired by neural radiance field (NeRF) using only 2D labels.
We employ differentiable volumetric rendering to predict depth and semantic maps and train a 3D network based on 2D supervision only.
arXiv Detail & Related papers (2024-02-20T08:04:12Z) - 3D Vascular Segmentation Supervised by 2D Annotation of Maximum
Intensity Projection [33.34240545722551]
Vascular structure segmentation plays a crucial role in medical analysis and clinical applications.
Existing weakly supervised methods have exhibited suboptimal performance when handling sparse vascular structure.
Here, we employ maximum intensity projection (MIP) to decrease the dimensionality of 3D volume to 2D image for efficient annotation.
We introduce a weakly-supervised network that fuses 2D-3D deep features via MIP to further improve segmentation performance.
arXiv Detail & Related papers (2024-02-19T13:24:46Z) - Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature
Aligned Pre-Training and Region-Aware Fine-tuning [55.517000360348725]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited.
To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy.
Experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
arXiv Detail & Related papers (2023-12-01T15:47:04Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.