Deep Reference Priors: What is the best way to pretrain a model?
- URL: http://arxiv.org/abs/2202.00187v1
- Date: Tue, 1 Feb 2022 02:32:39 GMT
- Title: Deep Reference Priors: What is the best way to pretrain a model?
- Authors: Yansong Gao, Rahul Ramesh, Pratik Chaudhari
- Abstract summary: This paper formalizes the question using the theory of reference priors.
Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the model.
This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data.
- Score: 27.705359364301458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What is the best way to exploit extra data -- be it unlabeled data from the
same task, or labeled data from a related task -- to learn a given task? This
paper formalizes the question using the theory of reference priors. Reference
priors are objective, uninformative Bayesian priors that maximize the mutual
information between the task and the weights of the model. Such priors enable
the task to maximally affect the Bayesian posterior, e.g., reference priors
depend upon the number of samples available for learning the task and for very
small sample sizes, the prior puts more probability mass on low-complexity
models in the hypothesis space. This paper presents the first demonstration of
reference priors for medium-scale deep networks and image-based data. We
develop generalizations of reference priors and demonstrate applications to two
problems. First, by using unlabeled data to compute the reference prior, we
develop new Bayesian semi-supervised learning methods that remain effective
even with very few samples per class. Second, by using labeled data from the
source task to compute the reference prior, we develop a new pretraining method
for transfer learning that allows data from the target task to maximally affect
the Bayesian posterior. Empirical validation of these methods is conducted on
image classification datasets.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation [28.80089773616623]
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review.
Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation.
We propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation.
arXiv Detail & Related papers (2024-06-30T09:25:42Z) - Transfer Learning with Informative Priors: Simple Baselines Better than Previously Reported [4.453137996095194]
We compare transfer learning with and without source task informed priors across 5 datasets.
For the scenario of 5-300 examples per class, we find negative or negligible gains on 2 datasets, modest gains on 2 other datasets, and substantial gains on one dataset.
arXiv Detail & Related papers (2024-05-24T14:12:23Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs.
For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction.
A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z) - Prior-Free Continual Learning with Unlabeled Data in the Wild [24.14279172551939]
We propose a Prior-Free Continual Learning (PFCL) method to incrementally update a trained model on new tasks.
PFCL learns new tasks without knowing the task identity or any previous data.
Our experiments show that our PFCL method significantly mitigates forgetting in all three learning scenarios.
arXiv Detail & Related papers (2023-10-16T13:59:56Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative
Priors [59.93972277761501]
We show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches.
This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks.
arXiv Detail & Related papers (2022-05-20T16:19:30Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.