Transferred Discrepancy: Quantifying the Difference Between
Representations
- URL: http://arxiv.org/abs/2007.12446v1
- Date: Fri, 24 Jul 2020 10:59:11 GMT
- Title: Transferred Discrepancy: Quantifying the Difference Between
Representations
- Authors: Yunzhen Feng, Runtian Zhai, Di He, Liwei Wang, Bin Dong
- Abstract summary: Transferred discrepancy (TD) is a metric that defines the difference between two representations based on their downstream-task performance.
We show how TD correlates with downstream tasks and the necessity to define metrics in such a task-dependent fashion.
TD may also be used to evaluate the effectiveness of different training strategies.
- Score: 35.957762733342804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding what information neural networks capture is an essential
problem in deep learning, and studying whether different models capture similar
features is an initial step to achieve this goal. Previous works sought to
define metrics over the feature matrices to measure the difference between two
models. However, different metrics sometimes lead to contradictory conclusions,
and there has been no consensus on which metric is suitable to use in practice.
In this work, we propose a novel metric that goes beyond previous approaches.
Recall that one of the most practical scenarios of using the learned
representations is to apply them to downstream tasks. We argue that we should
design the metric based on a similar principle. For that, we introduce the
transferred discrepancy (TD), a new metric that defines the difference between
two representations based on their downstream-task performance. Through an
asymptotic analysis, we show how TD correlates with downstream tasks and the
necessity to define metrics in such a task-dependent fashion. In particular, we
also show that under specific conditions, the TD metric is closely related to
previous metrics. Our experiments show that TD can provide fine-grained
information for varied downstream tasks, and for the models trained from
different initializations, the learned features are not the same in terms of
downstream-task predictions. We find that TD may also be used to evaluate the
effectiveness of different training strategies. For example, we demonstrate
that the models trained with proper data augmentations that improve the
generalization capture more similar features in terms of TD, while those with
data augmentations that hurt the generalization will not. This suggests a
training strategy that leads to more robust representation also trains models
that generalize better.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Amortised Invariance Learning for Contrastive Self-Supervision [11.042648980854485]
We introduce the notion of amortised invariance learning for contrastive self supervision.
We show that our amortised features provide a reliable way to learn diverse downstream tasks with different invariance requirements.
This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.
arXiv Detail & Related papers (2023-02-24T16:15:11Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - Towards GAN Benchmarks Which Require Generalization [48.075521136623564]
We argue that estimating the function must require a large sample from the model.
We turn to neural network divergences (NNDs) which are defined in terms of a neural network trained to distinguish between distributions.
The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples.
arXiv Detail & Related papers (2020-01-10T20:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.