Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison
- URL: http://arxiv.org/abs/2311.01537v2
- Date: Tue, 04 Nov 2025 20:01:33 GMT
- Title: Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison
- Authors: Kensuke Mitsuzawa, Motonobu Kanagawa, Stefano Bortoli, Margherita Grossi, Paolo Papotti,
- Abstract summary: Two-sample variable selection identifies variables that discriminate between the distributions of two sets of data vectors.<n>We introduce a mathematical notion of the discriminating set of variables.<n>We propose two methods for two-sample variable selection that assign weights to variables and optimise them.
- Score: 9.73988606052838
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We study two-sample variable selection: identifying variables that discriminate between the distributions of two sets of data vectors. Such variables help scientists understand the mechanisms behind dataset discrepancies. Although domain-specific methods exist (e.g., in medical imaging, genetics, and computational social science), a general framework remains underdeveloped. We make two separate contributions. (i) We introduce a mathematical notion of the discriminating set of variables: the largest subset containing no variables whose marginals are identical across the two distributions and independent of the remaining variables. We prove this set is uniquely defined and establish further properties, making it a suitable ground truth for theory and evaluation. (ii) We propose two methods for two-sample variable selection that assign weights to variables and optimise them to maximise the power of a kernel two-sample test while enforcing sparsity to downweight redundant variables. To select the regularisation parameter - unknown in practice, as it controls the number of selected variables - we develop two data-driven procedures to balance recall and precision. Synthetic experiments show improved performance over baselines, and we illustrate the approach on two applications using datasets from water-pipe and traffic networks.
Related papers
- On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift [58.91436551466064]
This paper investigates the interconnections among three fundamental problems, calibration, and quantification, under dataset shift conditions.<n>We show that access to an oracle for any one of these tasks enables the resolution of the other two.<n>We propose new methods for each problem based on direct adaptations of well-established methods borrowed from the other disciplines.
arXiv Detail & Related papers (2025-05-16T15:42:55Z) - GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts [58.95913531746308]
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training.
We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
arXiv Detail & Related papers (2025-02-15T10:10:49Z) - Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.<n>We derive an upper bound on the 2-Wasserstein distance between normal distributions.<n>Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data.
We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z) - Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers.
Our framework achieves iteration complexity of $tilde O(d2/epsilon)$.
We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z) - Generative vs. Discriminative modeling under the lens of uncertainty quantification [0.929965561686354]
In this paper, we undertake a comparative analysis of generative and discriminative approaches.
We compare the ability of both approaches to leverage information from various sources in an uncertainty aware inference.
We propose a general sampling scheme enabling supervised learning for both approaches, as well as semi-supervised learning when compatible with the considered modeling approach.
arXiv Detail & Related papers (2024-06-13T14:32:43Z) - Proxy Methods for Domain Adaptation [78.03254010884783]
proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables.
We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.
arXiv Detail & Related papers (2024-03-12T09:32:41Z) - Data-driven path collective variables [0.0]
We propose a new method for the generation, optimization, and comparison of collective variables.
The resulting collective variable is one-dimensional, interpretable, and differentiable.
We demonstrate the validity of the method on two different applications.
arXiv Detail & Related papers (2023-12-21T14:07:47Z) - Winning Prize Comes from Losing Tickets: Improve Invariant Learning by
Exploring Variant Parameters for Out-of-Distribution Generalization [76.27711056914168]
Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features.
Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task.
We propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift.
arXiv Detail & Related papers (2023-10-25T06:10:57Z) - Scalable variable selection for two-view learning tasks with projection
operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems.
Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z) - Variable Selection for Kernel Two-Sample Tests [10.768155884359777]
We propose a framework based on the kernel maximum mean discrepancy (MMD)
We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees.
Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.
arXiv Detail & Related papers (2023-02-15T00:39:56Z) - Spectral Regularized Kernel Two-Sample Tests [7.915420897195129]
We show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance.
We propose a modification to the MMD test based on spectral regularization and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test.
Our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples.
arXiv Detail & Related papers (2022-12-19T00:42:21Z) - Variational Autoencoder Kernel Interpretation and Selection for
Classification [59.30734371401315]
This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder.
In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel.
choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels.
arXiv Detail & Related papers (2022-09-10T17:22:53Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Linear Discriminant Analysis with High-dimensional Mixed Variables [10.774094462083843]
This paper develops a novel approach for classifying high-dimensional observations with mixed variables.
We overcome the challenge of having to split data into exponentially many cells.
Results on the estimation accuracy and the misclassification rates are established.
arXiv Detail & Related papers (2021-12-14T03:57:56Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Variational inference with a quantum computer [0.0]
Inference is the task of drawing conclusions about unobserved variables given observations of related variables.
One alternative is variational inference, where a candidate probability distribution is optimized to approximate the posterior distribution over unobserved variables.
In this work, we propose quantum Born machines as variational distributions over discrete variables.
arXiv Detail & Related papers (2021-03-11T15:12:21Z) - Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study [3.4888132404740797]
This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
arXiv Detail & Related papers (2021-03-03T21:10:26Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.