Reverse Knowledge Distillation: Training a Large Model using a Small One
for Retinal Image Matching on Limited Data
- URL: http://arxiv.org/abs/2307.10698v2
- Date: Fri, 21 Jul 2023 05:05:52 GMT
- Title: Reverse Knowledge Distillation: Training a Large Model using a Small One
for Retinal Image Matching on Limited Data
- Authors: Sahar Almahfouz Nasser, Nihar Gupte, and Amit Sethi
- Abstract summary: We propose a novel approach based on reverse knowledge distillation to train large models with limited data.
We train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model.
Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output.
- Score: 1.9521342770943706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retinal image matching plays a crucial role in monitoring disease progression
and treatment response. However, datasets with matched keypoints between
temporally separated pairs of images are not available in abundance to train
transformer-based model. We propose a novel approach based on reverse knowledge
distillation to train large models with limited data while preventing
overfitting. Firstly, we propose architectural modifications to a CNN-based
semi-supervised method called SuperRetina that help us improve its results on a
publicly available dataset. Then, we train a computationally heavier model
based on a vision transformer encoder using the lighter CNN-based model, which
is counter-intuitive in the field knowledge-distillation research where
training lighter models based on heavier ones is the norm. Surprisingly, such
reverse knowledge distillation improves generalization even further. Our
experiments suggest that high-dimensional fitting in representation space may
prevent overfitting unlike training directly to match the final output. We also
provide a public dataset with annotations for retinal image keypoint detection
and matching to help the research community develop algorithms for retinal
image applications.
Related papers
- Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry [1.2289361708127877]
We propose a causal visual-inertial fusion transformer (VIFT) for pose estimation in deep visual-inertial odometry.
The proposed method is end-to-end trainable and requires only a monocular camera and IMU during inference.
arXiv Detail & Related papers (2024-09-13T12:21:25Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model.
Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models.
The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z) - Terrain Classification using Transfer Learning on Hyperspectral Images:
A Comparative study [0.13999481573773068]
convolutional neural network (CNN) and the Multi-Layer Perceptron (MLP) have been proven to be an effective method of image classification.
However, they suffer from the issues of long training time and requirement of large amounts of the labeled data.
We propose using the method of transfer learning to decrease the training time and reduce the dependence on large labeled dataset.
arXiv Detail & Related papers (2022-06-19T14:36:33Z) - On-the-Fly Test-time Adaptation for Medical Image Segmentation [63.476899335138164]
Adapting the source model to target data distribution at test-time is an efficient solution for the data-shift problem.
We propose a new framework called Adaptive UNet where each convolutional block is equipped with an adaptive batch normalization layer.
During test-time, the model takes in just the new test image and generates a domain code to adapt the features of source model according to the test data.
arXiv Detail & Related papers (2022-03-10T18:51:29Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - Conditional Variational Autoencoder for Learned Image Reconstruction [5.487951901731039]
We develop a novel framework that approximates the posterior distribution of the unknown image at each query observation.
It handles implicit noise models and priors, it incorporates the data formation process (i.e., the forward operator), and the learned reconstructive properties are transferable between different datasets.
arXiv Detail & Related papers (2021-10-22T10:02:48Z) - Y-GAN: Learning Dual Data Representations for Efficient Anomaly
Detection [0.0]
We propose a novel reconstruction-based model for anomaly detection, called Y-GAN.
The model consists of a Y-shaped auto-encoder and represents images in two separate latent spaces.
arXiv Detail & Related papers (2021-09-28T20:17:04Z) - How to train your ViT? Data, Augmentation, and Regularization in Vision
Transformers [74.06040005144382]
Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications.
We conduct a systematic empirical study in order to better understand the interplay between the amount of training data, AugReg, model size and compute budget.
We train ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset.
arXiv Detail & Related papers (2021-06-18T17:58:20Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Data-Efficient Ranking Distillation for Image Retrieval [15.88955427198763]
Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network.
In this paper we address knowledge distillation for metric learning problems.
Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, andiii) small fraction of original training data without any ground-truth labels.
arXiv Detail & Related papers (2020-07-10T10:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.