Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via
Training-free Networks
- URL: http://arxiv.org/abs/2308.12961v1
- Date: Thu, 24 Aug 2023 17:58:03 GMT
- Title: Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via
Training-free Networks
- Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao
Dong, Peng Gao
- Abstract summary: 3D few-shot segmentation methods first pre-train the models on seen' classes, and then evaluate their performance on unseen' classes.
We propose an efficient Training-free Few-shot 3D netwrok,3D, and a further training-based variant,3DT.
Experiments demonstrate3D3DT improves previous state-of-the-art methods by +6.93% and +17.96% mIoU on S3DIS and ScanNet, while reducing training time by -90%.
- Score: 34.758951766323136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce the reliance on large-scale datasets, recent works in 3D
segmentation resort to few-shot learning. Current 3D few-shot semantic
segmentation methods first pre-train the models on `seen' classes, and then
evaluate their generalization performance on `unseen' classes. However, the
prior pre-training stage not only introduces excessive time overhead, but also
incurs a significant domain gap on `unseen' classes. To tackle these issues, we
propose an efficient Training-free Few-shot 3D Segmentation netwrok, TFS3D, and
a further training-based variant, TFS3D-T. Without any learnable parameters,
TFS3D extracts dense representations by trigonometric positional encodings, and
achieves comparable performance to previous training-based methods. Due to the
elimination of pre-training, TFS3D can alleviate the domain gap issue and save
a substantial amount of time. Building upon TFS3D, TFS3D-T only requires to
train a lightweight query-support transferring attention (QUEST), which
enhances the interaction between the few-shot query and support data.
Experiments demonstrate TFS3D-T improves previous state-of-the-art methods by
+6.93% and +17.96% mIoU respectively on S3DIS and ScanNet, while reducing the
training time by -90%, indicating superior effectiveness and efficiency.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders [32.85484320025852]
We propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D data lifted from images by a large depth estimation model.
Our method achieves state-of-the-art performance in 3D classification and few-shot learning while maintaining high pre-training and downstream fine-tuning efficiency.
arXiv Detail & Related papers (2024-08-19T13:59:53Z) - No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation [40.0506169981233]
We propose a Non-parametric Network for few-shot 3D, Seg-NN, and its Parametric variant, Seg-PN.
Seg-PN extracts dense representations by hand-crafted filters and achieves comparable performance to existing parametric models.
Experiments suggest that Seg-PN outperforms previous state-of-the-art method by +4.19% and +7.71% mIoU on S3DIS and ScanNet datasets respectively.
arXiv Detail & Related papers (2024-04-05T12:09:36Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks.
These 2D models have been surpassed by 3D models on 3D computer vision benchmarks.
We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z) - CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP [55.864132158596206]
Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D zero-shot and few-shot learning.
We make the first attempt to investigate how CLIP knowledge benefits 3D scene understanding.
We propose CLIP2Scene, a framework that transfers CLIP knowledge from 2D image-text pre-trained models to a 3D point cloud network.
arXiv Detail & Related papers (2023-01-12T10:42:39Z) - Language-Grounded Indoor 3D Semantic Segmentation in the Wild [33.40572976383402]
We study a larger vocabulary for 3D semantic segmentation with a new extended benchmark on ScanNet data with 200 class categories.
We propose a language-driven pre-training method to encourage learned 3D features to lie close to their pre-trained text embeddings.
Our approach consistently outperforms state-of-the-art 3D pre-training for 3D semantic segmentation on our proposed benchmark.
arXiv Detail & Related papers (2022-04-16T09:17:40Z) - Self-Supervised Pretraining of 3D Features on any Point-Cloud [40.26575888582241]
We present a simple self-supervised pertaining method that can work with any 3D data without 3D registration.
We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining.
arXiv Detail & Related papers (2021-01-07T18:55:21Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.