Classification Tree Pruning Under Covariate Shift
- URL: http://arxiv.org/abs/2305.04335v2
- Date: Thu, 22 Jun 2023 01:29:12 GMT
- Title: Classification Tree Pruning Under Covariate Shift
- Authors: Nicholas Galbraith and Samory Kpotufe
- Abstract summary: We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance.
We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
- Score: 7.982668978293684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of \emph{pruning} a classification tree, that is,
selecting a suitable subtree that balances bias and variance, in common
situations with inhomogeneous training data. Namely, assuming access to mostly
data from a distribution $P_{X, Y}$, but little data from a desired
distribution $Q_{X, Y}$ with different $X$-marginals, we present the first
efficient procedure for optimal pruning in such situations, when
cross-validation and other penalized variants are grossly inadequate.
Optimality is derived with respect to a notion of \emph{average discrepancy}
$P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a
recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture
the limits of classification under such a distribution shift. Our relaxed
notion can be viewed as a measure of \emph{relative dimension} between
distributions, as it relates to existing notions of information such as the
Minkowski and Renyi dimensions.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Universal Batch Learning Under The Misspecification Setting [4.772817128620037]
We consider the problem of universal em batch learning in a misspecification setting with log-loss.
We derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret.
arXiv Detail & Related papers (2024-05-12T11:16:05Z) - Distribution-Free Rates in Neyman-Pearson Classification [4.169915659794566]
We consider the problem of Neyman-Pearson classification which models unbalanced classification settings.
We provide a full characterization of possible distribution-free rates, i.e., minimax rates over the space of all pairs.
The rates involve a dichotomy between hard and easy classes $mathcalH$ as characterized by a simple geometric condition.
arXiv Detail & Related papers (2024-02-14T20:21:43Z) - On Rate-Optimal Partitioning Classification from Observable and from
Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions.
The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed.
We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node)
The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint.
We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z) - Linear Optimal Transport Embedding: Provable Wasserstein classification
for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields.
The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space.
We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Optimal Transport for Conditional Domain Matching and Label Shift [0.0]
We address the problem of unsupervised domain adaptation under the setting of generalized target shift.
For good generalization, it is necessary to learn a latent representation in which both marginals and class-conditional distributions are aligned across domains.
We propose a learning problem that minimizes importance weighted loss in the source domain and a Wasserstein distance between weighted marginals.
arXiv Detail & Related papers (2020-06-15T06:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.