FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models
- URL: http://arxiv.org/abs/2312.17051v1
- Date: Thu, 28 Dec 2023 14:52:07 GMT
- Title: FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models
- Authors: Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng
Zuo
- Abstract summary: Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
- Score: 62.663113296987085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic
forgetting issue when a model is incrementally trained on limited data. While
the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in
addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL
faces limitations. These limitations arise from feature space misalignment and
significant noise in real-world scanned 3D data. To address these challenges,
we introduce two novel components: the Redundant Feature Eliminator (RFE) and
the Spatial Noise Compensator (SNC). RFE aligns the feature spaces of input
point clouds and their embeddings by performing a unique dimensionality
reduction on the feature space of pre-trained models (PTMs), effectively
eliminating redundant information without compromising semantic integrity. On
the other hand, SNC is a graph-based 3D model designed to capture robust
geometric information within point clouds, thereby augmenting the knowledge
lost due to projection, particularly when processing real-world scanned data.
Considering the imbalance in existing 3D datasets, we also propose new
evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
Traditional accuracy metrics are proved to be biased; thus, our metrics focus
on the model's proficiency in learning new classes while maintaining the
balance between old and new classes. Experimental results on both established
3D FSCIL benchmarks and our dataset demonstrate that our approach significantly
outperforms existing state-of-the-art methods.
Related papers
- GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Learning Occupancy for Monocular 3D Object Detection [25.56336546513198]
We propose textbfOccupancyM3D, a method of learning occupancy for monocular 3D detection.
It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations.
Experiments on KITTI and open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
arXiv Detail & Related papers (2023-05-25T04:03:46Z) - Learning-based Point Cloud Registration for 6D Object Pose Estimation in
the Real World [55.7340077183072]
We tackle the task of estimating the 6D pose of an object from point cloud data.
Recent learning-based approaches to addressing this task have shown great success on synthetic datasets.
We analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds.
arXiv Detail & Related papers (2022-03-29T07:55:04Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting [38.7610646073842]
I3DOL is first exploration to learn new classes of 3D object continually.
An adaptive-geometric centroid module is designed to construct discriminative local geometric structures.
A geometric-aware attention mechanism is developed to quantify the contributions of local geometric structures.
arXiv Detail & Related papers (2020-12-16T15:17:51Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.