Accelerating Generalized Linear Models by Trading off Computation for
Uncertainty
- URL: http://arxiv.org/abs/2310.20285v2
- Date: Wed, 7 Feb 2024 08:37:16 GMT
- Title: Accelerating Generalized Linear Models by Trading off Computation for
Uncertainty
- Authors: Lukas Tatzel, Jonathan Wenger, Frank Schneider, Philipp Hennig
- Abstract summary: Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data.
The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction.
We introduce a family of iterative methods that explicitly model this error.
- Score: 29.877181350448193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic
framework to model categorical, ordinal and continuous data, and are widely
used in practice. However, exact inference in GLMs is prohibitively expensive
for large datasets, thus requiring approximations in practice. The resulting
approximation error adversely impacts the reliability of the model and is not
accounted for in the uncertainty of the prediction. In this work, we introduce
a family of iterative methods that explicitly model this error. They are
uniquely suited to parallel modern computing hardware, efficiently recycle
computations, and compress information to reduce both the time and memory
requirements for GLMs. As we demonstrate on a realistically large
classification problem, our method significantly accelerates training compared
to competitive baselines by trading off reduced computation for increased
uncertainty.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Gradient-free variational learning with conditional mixture networks [39.827869318925494]
Conditional mixture networks (CMNs) are suitable for fast, gradient-free inference and can solve complex classification tasks.
We validate this approach by training two-layer CMNs on standard benchmarks from the UCI repository.
Our method, CAVI-CMN, achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.
arXiv Detail & Related papers (2024-08-29T10:43:55Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Scalable Higher-Order Tensor Product Spline Models [0.0]
We propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model.
Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions.
arXiv Detail & Related papers (2024-02-02T01:18:48Z) - Additive Higher-Order Factorization Machines [0.0]
We derive a scalable high-order tensor product spline model using a factorization approach.
Our method allows to include all (higher-order) interactions of non-linear feature effects.
We prove both theoretically and empirically that our methods scales notably better than existing approaches.
arXiv Detail & Related papers (2022-05-28T19:50:52Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Multi-output Gaussian Processes for Uncertainty-aware Recommender
Systems [3.908842679355254]
We introduce an efficient strategy for model training and inference, resulting in a model that scales to very large and sparse datasets.
Our model also provides meaningful uncertainty estimates about quantifying that prediction.
arXiv Detail & Related papers (2021-06-08T10:01:14Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Scaling Bayesian inference of mixed multinomial logit models to very
large datasets [9.442139459221785]
We propose an Amortized Variational Inference approach that leverages backpropagation, automatic differentiation and GPU-accelerated computation.
We show how normalizing flows can be used to increase the flexibility of the variational posterior approximations.
arXiv Detail & Related papers (2020-04-11T15:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.