Classification Tree Pruning Under Covariate Shift
- URL: http://arxiv.org/abs/2305.04335v2
- Date: Thu, 22 Jun 2023 01:29:12 GMT
- Title: Classification Tree Pruning Under Covariate Shift
- Authors: Nicholas Galbraith and Samory Kpotufe
- Abstract summary: We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance.
We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
- Score: 7.982668978293684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of \emph{pruning} a classification tree, that is,
selecting a suitable subtree that balances bias and variance, in common
situations with inhomogeneous training data. Namely, assuming access to mostly
data from a distribution $P_{X, Y}$, but little data from a desired
distribution $Q_{X, Y}$ with different $X$-marginals, we present the first
efficient procedure for optimal pruning in such situations, when
cross-validation and other penalized variants are grossly inadequate.
Optimality is derived with respect to a notion of \emph{average discrepancy}
$P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a
recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture
the limits of classification under such a distribution shift. Our relaxed
notion can be viewed as a measure of \emph{relative dimension} between
distributions, as it relates to existing notions of information such as the
Minkowski and Renyi dimensions.
Related papers
- Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.
We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius.
Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z) - Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Universal Batch Learning Under The Misspecification Setting [4.772817128620037]
We consider the problem of universal em batch learning in a misspecification setting with log-loss.
We derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret.
arXiv Detail & Related papers (2024-05-12T11:16:05Z) - On Rate-Optimal Partitioning Classification from Observable and from
Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions.
The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed.
We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node)
The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint.
We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z) - Linear Optimal Transport Embedding: Provable Wasserstein classification
for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields.
The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space.
We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Optimal Transport for Conditional Domain Matching and Label Shift [0.0]
We address the problem of unsupervised domain adaptation under the setting of generalized target shift.
For good generalization, it is necessary to learn a latent representation in which both marginals and class-conditional distributions are aligned across domains.
We propose a learning problem that minimizes importance weighted loss in the source domain and a Wasserstein distance between weighted marginals.
arXiv Detail & Related papers (2020-06-15T06:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.