Unifying Approaches in Data Subset Selection via Fisher Information and
Information-Theoretic Quantities
- URL: http://arxiv.org/abs/2208.00549v1
- Date: Mon, 1 Aug 2022 00:36:57 GMT
- Title: Unifying Approaches in Data Subset Selection via Fisher Information and
Information-Theoretic Quantities
- Authors: Andreas Kirsch, Yarin Gal
- Abstract summary: We revisit the Fisher information and use it to show how several otherwise disparate methods are connected as approximations of information-theoretic quantities.
In data subset selection, i.e. active learning and active sampling, several recent works use Fisher information, Hessians, similarity matrices based on the gradients, or simply the gradient lengths to compute the acquisition scores that guide sample selection.
- Score: 38.59619544501593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mutual information between predictions and model parameters -- also
referred to as expected information gain or BALD in machine learning --
measures informativeness. It is a popular acquisition function in Bayesian
active learning and Bayesian optimal experiment design. In data subset
selection, i.e. active learning and active sampling, several recent works use
Fisher information, Hessians, similarity matrices based on the gradients, or
simply the gradient lengths to compute the acquisition scores that guide sample
selection. Are these different approaches connected, and if so how? In this
paper, we revisit the Fisher information and use it to show how several
otherwise disparate methods are connected as approximations of
information-theoretic quantities.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Towards Bayesian Data Selection [0.0]
Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization.
We embed this kind of data addition into decision theory by framing data selection as a decision problem.
For the illustrative case of self-training in semi-supervised learning, we derive the respective Bayes criterion.
arXiv Detail & Related papers (2024-06-18T12:40:15Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - FUNCK: Information Funnels and Bottlenecks for Invariant Representation
Learning [7.804994311050265]
We investigate a set of related information funnels and bottleneck problems that claim to learn invariant representations from the data.
We propose a new element to this family of information-theoretic objectives: The Conditional Privacy Funnel with Side Information.
Given the generally intractable objectives, we derive tractable approximations using amortized variational inference parameterized by neural networks.
arXiv Detail & Related papers (2022-11-02T19:37:55Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - A Bayesian Framework for Information-Theoretic Probing [51.98576673620385]
We argue that probing should be seen as approximating a mutual information.
This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences.
This paper proposes a new framework to measure what we term Bayesian mutual information.
arXiv Detail & Related papers (2021-09-08T18:08:36Z) - Diminishing Uncertainty within the Training Pool: Active Learning for
Medical Image Segmentation [6.3858225352615285]
We explore active learning for the task of segmentation of medical imaging data sets.
We propose three new strategies for active learning: increasing frequency of uncertain data to bias the training data set, using mutual information among the input images as a regularizer and adaptation of Dice log-likelihood for Stein variational gradient descent (SVGD)
The results indicate an improvement in terms of data reduction by achieving full accuracy while only using 22.69 % and 48.85 % of the available data for each dataset, respectively.
arXiv Detail & Related papers (2021-01-07T01:55:48Z) - A Tutorial on Learning With Bayesian Networks [8.98526174345299]
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest.
A Bayesian network can be used to learn causal relationships.
It can also be used to gain understanding about a problem domain and to predict the consequences of intervention.
arXiv Detail & Related papers (2020-02-01T20:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.