Robust Statistical Comparison of Random Variables with Locally Varying
Scale of Measurement
- URL: http://arxiv.org/abs/2306.12803v2
- Date: Mon, 4 Mar 2024 16:21:54 GMT
- Title: Robust Statistical Comparison of Random Variables with Locally Varying
Scale of Measurement
- Authors: Christoph Jansen, Georg Schollmeyer, Hannah Blocher, Julian Rodemann,
Thomas Augustin
- Abstract summary: Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning.
We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces.
This order contains dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given.
- Score: 0.562479170374811
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Spaces with locally varying scale of measurement, like multidimensional
structures with differently scaled dimensions, are pretty common in statistics
and machine learning. Nevertheless, it is still understood as an open question
how to exploit the entire information encoded in them properly. We address this
problem by considering an order based on (sets of) expectations of random
variables mapping into such non-standard spaces. This order contains stochastic
dominance and expectation order as extreme cases when no, or respectively
perfect, cardinal structure is given. We derive a (regularized) statistical
test for our proposed generalized stochastic dominance (GSD) order,
operationalize it by linear optimization, and robustify it by imprecise
probability models. Our findings are illustrated with data from
multidimensional poverty measurement, finance, and medicine.
Related papers
- Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Random Smoothing Regularization in Kernel Gradient Descent Learning [24.383121157277007]
We present a framework for random smoothing regularization that can adaptively learn a wide range of ground truth functions belonging to the classical Sobolev spaces.
Our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality.
arXiv Detail & Related papers (2023-05-05T13:37:34Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - Amortized Variational Inference for Simple Hierarchical Models [37.56550107432323]
It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset.
This paper suggests an amortized approach where shared parameters simultaneously represent all local distributions.
It is also dramatically faster than using a structured variational distribution.
arXiv Detail & Related papers (2021-11-04T20:29:12Z) - Interaction Models and Generalized Score Matching for Compositional Data [9.797319790710713]
We propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex.
Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions.
A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.
arXiv Detail & Related papers (2021-09-10T05:29:41Z) - Estimating Graph Dimension with Cross-validated Eigenvalues [5.0013150536632995]
In applied statistics, estimating the number of latent dimensions or the number of clusters is a fundamental and recurring problem.
We provide a cross-validated eigenvalues approach to this problem.
We prove that our procedure consistently estimates $k$ in scenarios where all $k$ dimensions can be estimated.
arXiv Detail & Related papers (2021-08-06T23:52:30Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.