ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation
- URL: http://arxiv.org/abs/2405.04121v1
- Date: Tue, 7 May 2024 08:44:13 GMT
- Title: ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation
- Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin,
- Abstract summary: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation.
Despite its potential, the textitweak teacher challenge arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels.
We propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm to address this problem.
- Score: 15.404188754049317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Multi-Stage Knowledge Distillation, transferring comprehensive knowledge from the Vision Foundation Model (VFM), extensively trained on diverse open-world images. This enables effective knowledge transfer to a lightweight student model across modalities. ELiTe employs Parameter-Efficient Fine-Tuning to strengthen the VFM teacher and expedite large-scale model training with minimal costs. Additionally, we introduce the Segment Anything Model based Pseudo-Label Generation approach to enhance low-quality image labels, facilitating robust semantic representations. Efficient knowledge transfer in ELiTe yields state-of-the-art results on the SemanticKITTI benchmark, outperforming real-time inference models. Our approach achieves this with significantly fewer parameters, confirming its effectiveness and efficiency.
Related papers
- Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.
We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.
We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation [53.95204595640208]
Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data.
Previous approaches have generated synthetic images at high resolutions without leveraging information from real images.
MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features.
arXiv Detail & Related papers (2024-11-26T02:23:31Z) - Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement [59.17372460692809]
This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training.
We introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors.
We also propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details.
arXiv Detail & Related papers (2024-09-25T04:05:32Z) - ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model [49.587821411012705]
We propose ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model.
It distills the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters.
EduAttention explores the cross-relationships between text features extracted by the teacher model and image features extracted by the student model.
arXiv Detail & Related papers (2024-08-08T01:12:21Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation [22.344399402787644]
This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM)
We propose a framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits.
Experiments on two benchmarks show that our GoodSAM achieves a remarkable +3.75% mIoU improvement over the state-of-the-art (SOTA) domain adaptation methods.
arXiv Detail & Related papers (2024-03-25T02:30:32Z) - X-Transfer: A Transfer Learning-Based Framework for GAN-Generated Fake
Image Detection [33.31312811230408]
misuse of GANs for generating deceptive images, such as face replacement, raises significant security concerns.
This paper introduces a novel GAN-generated image detection algorithm called X-Transfer.
It enhances transfer learning by utilizing two neural networks that employ interleaved parallel gradient transmission.
arXiv Detail & Related papers (2023-10-07T01:23:49Z) - Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP [84.90129481336659]
We study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned.
Inspired by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2023-10-02T06:41:30Z) - GPT4Image: Can Large Pre-trained Models Help Vision Models on Perception
Tasks? [51.22096780511165]
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations.
We feed detailed descriptions into a pre-trained encoder to extract text embeddings with rich semantic information that encodes the content of images.
arXiv Detail & Related papers (2023-06-01T14:02:45Z) - TAKT: Target-Aware Knowledge Transfer for Whole Slide Image Classification [46.803231708918624]
We propose a Target-Aware Knowledge Transfer framework, employing a teacher-student paradigm.
Our framework enables the teacher model to learn common knowledge from the source and target domains.
Our method achieves state-of-the-art performance among other knowledge transfer methods on various datasets.
arXiv Detail & Related papers (2023-03-10T08:29:35Z) - Rich Feature Distillation with Feature Affinity Module for Efficient
Image Dehazing [1.1470070927586016]
This work introduces a simple, lightweight, and efficient framework for single-image haze removal.
We exploit rich "dark-knowledge" information from a lightweight pre-trained super-resolution model via the notion of heterogeneous knowledge distillation.
Our experiments are carried out on the RESIDE-Standard dataset to demonstrate the robustness of our framework to the synthetic and real-world domains.
arXiv Detail & Related papers (2022-07-13T18:32:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.