The Expected Jacobian Outerproduct: Theory and Empirics
- URL: http://arxiv.org/abs/2006.03550v1
- Date: Fri, 5 Jun 2020 16:42:09 GMT
- Title: The Expected Jacobian Outerproduct: Theory and Empirics
- Authors: Shubhendu Trivedi, J. Wang
- Abstract summary: We show that the expected Jacobian outerproduct (EJOP) can be used as a metric to yield improvements in real-world non-parametric classification tasks.
We also show that the estimated EJOP can be used as a metric to yield improvements in metric learning tasks.
- Score: 3.172761915061083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The expected gradient outerproduct (EGOP) of an unknown regression function
is an operator that arises in the theory of multi-index regression, and is
known to recover those directions that are most relevant to predicting the
output. However, work on the EGOP, including that on its cheap estimators, is
restricted to the regression setting. In this work, we adapt this operator to
the multi-class setting, which we dub the expected Jacobian outerproduct
(EJOP). Moreover, we propose a simple rough estimator of the EJOP and show that
somewhat surprisingly, it remains statistically consistent under mild
assumptions. Furthermore, we show that the eigenvalues and eigenspaces also
remain consistent. Finally, we show that the estimated EJOP can be used as a
metric to yield improvements in real-world non-parametric classification tasks:
both by its use as a metric, and also as cheap initialization in metric
learning tasks.
Related papers
- Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm [12.201705893125775]
We introduce a novel natural experiment dataset obtained from an early childhood literacy nonprofit.
Applying over 20 established estimators to the dataset produces inconsistent results in evaluating the nonprofit's efficacy.
We create a benchmark to evaluate estimator accuracy using synthetic outcomes.
arXiv Detail & Related papers (2024-09-06T15:44:45Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Multifidelity Covariance Estimation via Regression on the Manifold of Symmetric Positive Definite Matrices [0.42855555838080844]
We show that our manifold regression multifidelity (MRMF) covariance estimator is a maximum likelihood estimator under a certain error model on manifold space.
We demonstrate via numerical examples that the MRMF estimator can provide significant decreases, up to one order of magnitude, in squared estimation error.
arXiv Detail & Related papers (2023-07-23T21:46:55Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Learning Dynamical Systems via Koopman Operator Regression in
Reproducing Kernel Hilbert Spaces [52.35063796758121]
We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system.
We link the risk with the estimation of the spectral decomposition of the Koopman operator.
Our results suggest RRR might be beneficial over other widely used estimators.
arXiv Detail & Related papers (2022-05-27T14:57:48Z) - Marginalized Operators for Off-policy Reinforcement Learning [53.37381513736073]
Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases.
We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.
arXiv Detail & Related papers (2022-03-30T09:59:59Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - On the benefits of maximum likelihood estimation for Regression and
Forecasting [35.386189585135334]
We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting.
This approach is better suited to capture inductive biases such as prior domain knowledge in datasets.
We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over Empirical Risk Minimization.
arXiv Detail & Related papers (2021-06-18T22:10:43Z) - On Low-rank Trace Regression under General Sampling Distribution [9.699586426043885]
We show that cross-validated estimators satisfy near-optimal error bounds on general assumptions.
We also show that the cross-validated estimator outperforms the theory-inspired approach of selecting the parameter.
arXiv Detail & Related papers (2019-04-18T02:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.