Maximum Likelihood Estimation is All You Need for Well-Specified
Covariate Shift
- URL: http://arxiv.org/abs/2311.15961v1
- Date: Mon, 27 Nov 2023 16:06:48 GMT
- Title: Maximum Likelihood Estimation is All You Need for Well-Specified
Covariate Shift
- Authors: Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin
- Abstract summary: Key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization.
We show that classical Maximum Likelihood Estimation (MLE) purely using source data achieves the minimax optimality.
We illustrate the wide applicability of our framework by instantiating it to three concrete examples.
- Score: 34.414261291690856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge of modern machine learning systems is to achieve
Out-of-Distribution (OOD) generalization -- generalizing to target data whose
distribution differs from that of source data. Despite its significant
importance, the fundamental question of ``what are the most effective
algorithms for OOD generalization'' remains open even under the standard
setting of covariate shift. This paper addresses this fundamental question by
proving that, surprisingly, classical Maximum Likelihood Estimation (MLE)
purely using source data (without any modification) achieves the minimax
optimality for covariate shift under the well-specified setting. That is, no
algorithm performs better than MLE in this setting (up to a constant factor),
justifying MLE is all you need. Our result holds for a very rich class of
parametric models, and does not require any boundedness condition on the
density ratio. We illustrate the wide applicability of our framework by
instantiating it to three concrete examples -- linear regression, logistic
regression, and phase retrieval. This paper further complement the study by
proving that, under the misspecified setting, MLE is no longer the optimal
choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax
optimal in certain scenarios.
Related papers
- On the Consistency of Maximum Likelihood Estimation of Probabilistic
Principal Component Analysis [1.0528389538549636]
PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.
Despite this wide applicability in various fields, hardly any theoretical guarantees exist to justify the soundness of the maximal likelihood (ML) solution for this model.
We propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.
arXiv Detail & Related papers (2023-11-08T22:40:45Z) - Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium
Learning from Offline Datasets [101.5329678997916]
We study episodic two-player zero-sum Markov games (MGs) in the offline setting.
The goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori.
arXiv Detail & Related papers (2022-02-15T15:39:30Z) - Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain
Adaptation [154.2195491708548]
We study the prediction discriminability and diversity by studying the structure of the classification output matrix of a randomly selected data batch.
We propose Batch Nuclear-norm Maximization and Minimization, which performs nuclear-norm on the target output matrix to enhance the target prediction ability.
Experiments show that our method could boost the adaptation accuracy and robustness under three typical domain adaptation scenarios.
arXiv Detail & Related papers (2021-07-13T15:08:32Z) - Complexity-Free Generalization via Distributionally Robust Optimization [4.313143197674466]
We present an alternate route to obtain generalization bounds on the solution from distributionally robust optimization (DRO)
Our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function.
Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function.
arXiv Detail & Related papers (2021-06-21T15:19:52Z) - The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
Speed-Accuracy Optimization [0.0]
We propose a model for multiclass classification of time series to make a prediction as early and as accurate as possible.
Our overall architecture for early classification, MSPRT-TANDEM, statistically significantly outperforms baseline models on four datasets.
arXiv Detail & Related papers (2021-05-28T07:21:58Z) - Meta-Learned Invariant Risk Minimization [12.6484257912092]
Empirical Risk Minimization (ERM) based machine learning algorithms have suffered from weak generalization performance on data obtained from out-of-distribution (OOD)
In this paper, we propose a novel meta-learning based approach for IRM.
We show that our algorithm not only has better OOD generalization performance than IRMv1 and all IRM variants, but also addresses the weakness of IRMv1 with improved stability.
arXiv Detail & Related papers (2021-03-24T02:52:48Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.