TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation
- URL: http://arxiv.org/abs/2404.01587v1
- Date: Tue, 2 Apr 2024 02:29:41 GMT
- Title: TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation
- Authors: Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen,
- Abstract summary: Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.
Existing methods overcome this by exploiting powerful yet large networks.
We propose a high-performance teacher and lightweight student distillation framework called TSCM.
- Score: 6.856317526681759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large networks, leading to significant consumption of computational resources. In this paper, we propose a high-performance teacher and lightweight student distillation framework called TSCM. It exploits our devised cross-metric knowledge distillation to narrow the performance gap between the teacher and student models, maintaining superior performance while enabling minimal computational load during deployment. We conduct comprehensive evaluations on large-scale datasets, namely Pittsburgh30k and Pittsburgh250k. Experimental results demonstrate the superiority of our method over baseline models in terms of recognition accuracy and model parameter efficiency. Moreover, our ablation studies show that the proposed knowledge distillation technique surpasses other counterparts. The code of our method has been released at https://github.com/nubot-nudt/TSCM.
Related papers
- MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation [17.27883003990266]
Vision-and-Language Navigation (VLN) is a core task in Embodied AI.
This paper introduces a two-stage knowledge distillation framework, producing a student model, MiniVLN.
Our findings indicate that the two-stage distillation approach is more effective in narrowing the performance gap between the teacher model and the student model.
arXiv Detail & Related papers (2024-09-27T14:54:54Z) - Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation [4.242540533823568]
Transformer models are usually computationally-expensive, and their effectiveness in light-weight models are limited compared to convolutions.
We propose a cross-architecture knowledge distillation method for MDE, dubbed DisDepth, to enhance efficient CNN models with the supervision of state-of-the-art transformer models.
Our method achieves significant improvements on various efficient backbones, showcasing its potential for efficient monocular depth estimation.
arXiv Detail & Related papers (2024-04-25T07:55:47Z) - ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation [3.301728339780329]
We propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models.
In our work, we propose an efficient method for generating soft labels, thereby eliminating the need for a large teacher model.
Our experiments on various datasets, including CIFAR-100, Tiny Imagenet, and Fashion MNIST, demonstrate the superior resource efficiency of our approach.
arXiv Detail & Related papers (2024-04-15T15:54:30Z) - Distilling Knowledge from CNN-Transformer Models for Enhanced Human
Action Recognition [1.8722948221596285]
The research aims to enhance the performance and efficiency of smaller student models by transferring knowledge from larger teacher models.
The proposed method employs a Transformer vision network as the student model, while a convolutional network serves as the teacher model.
The Vision Transformer (ViT) architecture is introduced as a robust framework for capturing global dependencies in images.
arXiv Detail & Related papers (2023-11-02T14:57:58Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Efficient Vision Transformers via Fine-Grained Manifold Distillation [96.50513363752836]
Vision transformer architectures have shown extraordinary performance on many computer vision tasks.
Although the network performance is boosted, transformers are often required more computational resources.
We propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches.
arXiv Detail & Related papers (2021-07-03T08:28:34Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Circumventing Outliers of AutoAugment with Knowledge Distillation [102.25991455094832]
AutoAugment has been a powerful algorithm that improves the accuracy of many vision tasks.
This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image.
To relieve the inaccuracy of supervision, we make use of knowledge distillation that refers to the output of a teacher model to guide network training.
arXiv Detail & Related papers (2020-03-25T11:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.