FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models
- URL: http://arxiv.org/abs/2312.17051v1
- Date: Thu, 28 Dec 2023 14:52:07 GMT
- Title: FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models
- Authors: Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng
Zuo
- Abstract summary: Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
- Score: 62.663113296987085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic
forgetting issue when a model is incrementally trained on limited data. While
the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in
addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL
faces limitations. These limitations arise from feature space misalignment and
significant noise in real-world scanned 3D data. To address these challenges,
we introduce two novel components: the Redundant Feature Eliminator (RFE) and
the Spatial Noise Compensator (SNC). RFE aligns the feature spaces of input
point clouds and their embeddings by performing a unique dimensionality
reduction on the feature space of pre-trained models (PTMs), effectively
eliminating redundant information without compromising semantic integrity. On
the other hand, SNC is a graph-based 3D model designed to capture robust
geometric information within point clouds, thereby augmenting the knowledge
lost due to projection, particularly when processing real-world scanned data.
Considering the imbalance in existing 3D datasets, we also propose new
evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
Traditional accuracy metrics are proved to be biased; thus, our metrics focus
on the model's proficiency in learning new classes while maintaining the
balance between old and new classes. Experimental results on both established
3D FSCIL benchmarks and our dataset demonstrate that our approach significantly
outperforms existing state-of-the-art methods.
Related papers
- DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Learning-based Point Cloud Registration for 6D Object Pose Estimation in
the Real World [55.7340077183072]
We tackle the task of estimating the 6D pose of an object from point cloud data.
Recent learning-based approaches to addressing this task have shown great success on synthetic datasets.
We analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds.
arXiv Detail & Related papers (2022-03-29T07:55:04Z) - Advancing 3D Medical Image Analysis with Variable Dimension Transform
based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework.
With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity.
Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z) - Spatio-temporal Self-Supervised Representation Learning for 3D Point
Clouds [96.9027094562957]
We introduce a-temporal representation learning framework, capable of learning from unlabeled tasks.
Inspired by how infants learn from visual data in the wild, we explore rich cues derived from the 3D data.
STRL takes two temporally-related frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly.
arXiv Detail & Related papers (2021-09-01T04:17:11Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting [38.7610646073842]
I3DOL is first exploration to learn new classes of 3D object continually.
An adaptive-geometric centroid module is designed to construct discriminative local geometric structures.
A geometric-aware attention mechanism is developed to quantify the contributions of local geometric structures.
arXiv Detail & Related papers (2020-12-16T15:17:51Z) - Procrustean Regression Networks: Learning 3D Structure of Non-Rigid
Objects from 2D Annotations [42.476537776831314]
We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects.
The proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets.
arXiv Detail & Related papers (2020-07-21T17:29:20Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.