Efficient Data-Dependent Learnability
- URL: http://arxiv.org/abs/2011.10334v1
- Date: Fri, 20 Nov 2020 10:44:55 GMT
- Title: Efficient Data-Dependent Learnability
- Authors: Yaniv Fogel, Tal Shapira and Meir Feder
- Abstract summary: The predictive normalized maximum likelihood (pNML) approach has recently been proposed as the min-max optimal solution to the batch learning problem.
We show that when applied to neural networks, this approximation can detect out-of-distribution examples effectively.
- Score: 8.766022970635898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The predictive normalized maximum likelihood (pNML) approach has recently
been proposed as the min-max optimal solution to the batch learning problem
where both the training set and the test data feature are individuals, known
sequences. This approach has yields a learnability measure that can also be
interpreted as a stability measure. This measure has shown some potential in
detecting out-of-distribution examples, yet it has considerable computational
costs. In this project, we propose and analyze an approximation of the pNML,
which is based on influence functions. Combining both theoretical analysis and
experiments, we show that when applied to neural networks, this approximation
can detect out-of-distribution examples effectively. We also compare its
performance to that achieved by conducting a single gradient step for each
possible label.
Related papers
- Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - On the Pitfalls of Heteroscedastic Uncertainty Estimation with
Probabilistic Neural Networks [23.502721524477444]
We present a synthetic example illustrating how this approach can lead to very poor but stable estimates.
We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue.
We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
arXiv Detail & Related papers (2022-03-17T08:46:17Z) - Reducing Overconfidence Predictions for Autonomous Driving Perception [3.1428836133120543]
We show that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition.
ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks.
arXiv Detail & Related papers (2022-02-16T01:59:55Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Instance Based Approximations to Profile Maximum Likelihood [33.51964370430905]
We provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution.
We obtain the first provable computationally efficient implementation of PseudoPML, a framework for estimating a broad class of symmetric properties.
arXiv Detail & Related papers (2020-11-05T11:17:19Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - SODEN: A Scalable Continuous-Time Survival Model through Ordinary
Differential Equation Networks [14.564168076456822]
We propose a flexible model for survival analysis using neural networks along with scalable optimization algorithms.
We demonstrate the effectiveness of the proposed method in comparison to existing state-of-the-art deep learning survival analysis models.
arXiv Detail & Related papers (2020-08-19T19:11:25Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.