Feature Representation Transferring to Lightweight Models via Perception Coherence
- URL: http://arxiv.org/abs/2505.06595v1
- Date: Sat, 10 May 2025 10:55:06 GMT
- Title: Feature Representation Transferring to Lightweight Models via Perception Coherence
- Authors: Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone,
- Abstract summary: We propose a method for transferring feature representation to lightweight student models from larger teacher models.<n>Our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.
- Score: 3.3975558777609915
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called \textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account the dissimilarities between data points in feature space through their ranking. At a high level, by minimizing this loss function, the student model learns to mimic how the teacher model \textit{perceives} inputs. More precisely, our method is motivated by the fact that the representational capacity of the student model is weaker than the teacher model. Hence, we aim to develop a new method allowing for a better relaxation. This means that, the student model does not need to preserve the absolute geometry of the teacher one, while preserving global coherence through dissimilarity ranking. Our theoretical insights provide a probabilistic perspective on the process of feature representation transfer. Our experiments results show that our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.
Related papers
- Information Shapes Koopman Representation [41.27407463371503]
We argue that difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity.<n>Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse.<n>We propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation.
arXiv Detail & Related papers (2025-10-14T22:48:06Z) - Single-weight Model Editing for Post-hoc Spurious Correlation Neutralization [54.8794775172033]
Neural network training tends to exploit the simplest features as shortcuts to greedily minimize training loss.<n>Some of these features might be spuriously correlated with the target labels, leading to incorrect predictions by the model.<n>We propose a unique precise class removal technique that makes a single-weight modification, which entails negligible performance compromise.
arXiv Detail & Related papers (2025-01-24T02:22:42Z) - Relational Representation Distillation [6.24302896438145]
Knowledge Distillation (KD) is an effective method for transferring knowledge from a large, well-trained teacher model to a smaller, more efficient student model.<n>Despite its success, one of main challenges in KD is ensuring the efficient transfer of complex knowledge while maintaining the student's computational efficiency.<n>We propose Representation Distillation (RRD), which improves knowledge transfer by maintaining sharpened structural relationships between metric feature representations.
arXiv Detail & Related papers (2024-07-16T14:56:13Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Any-Way Meta Learning [27.16222034423108]
We introduce the any-way" learning paradigm, an innovative model training approach that liberates model from fixed cardinality constraints.
Surprisingly, this model not only matches but often outperforms traditional fixed-way models in terms of performance, convergence speed, and stability.
arXiv Detail & Related papers (2024-01-10T12:00:53Z) - Understanding Probe Behaviors through Variational Bounds of Mutual
Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory.
First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning.
We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z) - Specify Robust Causal Representation from Mixed Observations [35.387451486213344]
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models.
We develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures.
We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts.
arXiv Detail & Related papers (2023-10-21T02:18:35Z) - How a student becomes a teacher: learning and forgetting through
Spectral methods [1.1470070927586018]
In theoretical ML, the teacher paradigm is often employed as an effective metaphor for real-life tuition.
In this work, we take a leap forward by proposing a radically different optimization scheme.
Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher.
arXiv Detail & Related papers (2023-10-19T09:40:30Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Proto2Proto: Can you recognize the car, the way I do? [23.09799187888976]
We present Proto2Proto, a novel method to transfer interpretability of one part network to another via knowledge distillation.
Our approach aims to add interpretability to the "dark" knowledge transferred from the teacher to the shallower student model.
Our experiments show that the proposed method indeed achieves interpretability transfer from teacher to student while simultaneously exhibiting competitive performance.
arXiv Detail & Related papers (2022-04-25T17:59:30Z) - It's All in the Head: Representation Knowledge Distillation through
Classifier Sharing [0.29360071145551075]
We introduce two approaches for enhancing representation distillation using classifier sharing between the teacher and student.
We show the effectiveness of the proposed methods on various datasets and tasks, including image classification, fine-grained classification, and face verification.
arXiv Detail & Related papers (2022-01-18T13:10:36Z) - Bag of Instances Aggregation Boosts Self-supervised Learning [122.61914701794296]
We propose a simple but effective distillation strategy for unsupervised learning.
Our method, termed as BINGO, targets at transferring the relationship learned by the teacher to the student.
BINGO achieves new state-of-the-art performance on small scale models.
arXiv Detail & Related papers (2021-07-04T17:33:59Z) - Understanding Robustness in Teacher-Student Setting: A New Perspective [42.746182547068265]
Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions.
Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness.
Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
arXiv Detail & Related papers (2021-02-25T20:54:24Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.