Related papers: Classification Tree Pruning Under Covariate Shift

Classification Tree Pruning Under Covariate Shift

URL: http://arxiv.org/abs/2305.04335v2
Date: Thu, 22 Jun 2023 01:29:12 GMT
Title: Classification Tree Pruning Under Covariate Shift
Authors: Nicholas Galbraith and Samory Kpotufe
Abstract summary: We consider the problem of emphpruning a classification tree, that is, selecting a suitable subtree that balances bias and variance. We present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate.
Score: 7.982668978293684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.

Related papers

Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization [0.4732176352681218]
This paper addresses the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. We propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. Our bounds rely on a newly introduced it compatibility measure, which fully characterizes the error propagation dynamics along the sequence.
arXiv Detail & Related papers (2024-10-17T22:07:25Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Universal Batch Learning Under The Misspecification Setting [4.772817128620037]
We consider the problem of universal em batch learning in a misspecification setting with log-loss. We derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret.
arXiv Detail & Related papers (2024-05-12T11:16:05Z)
Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$. We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
On Rate-Optimal Partitioning Classification from Observable and from Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions. The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed. We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z)
Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node) The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint. We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z)
Linear Optimal Transport Embedding: Provable Wasserstein classification for certain rigid transformations and perturbations [79.23797234241471]
Discriminating between distributions is an important problem in a number of scientific fields. The Linear Optimal Transportation (LOT) embeds the space of distributions into an $L2$-space. We demonstrate the benefits of LOT on a number of distribution classification problems.
arXiv Detail & Related papers (2020-08-20T19:09:33Z)
Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification. Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
Optimal Transport for Conditional Domain Matching and Label Shift [0.0]
We address the problem of unsupervised domain adaptation under the setting of generalized target shift. For good generalization, it is necessary to learn a latent representation in which both marginals and class-conditional distributions are aligned across domains. We propose a learning problem that minimizes importance weighted loss in the source domain and a Wasserstein distance between weighted marginals.
arXiv Detail & Related papers (2020-06-15T06:45:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.