Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets
- URL: http://arxiv.org/abs/2203.15234v1
- Date: Tue, 29 Mar 2022 04:54:06 GMT
- Title: Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets
- Authors: Vishnu Suresh Lokhande, Rudrasis Chakraborty, Sathya N. Ravi, Vikas
Singh
- Abstract summary: In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
- Score: 53.34152466646884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pooling multiple neuroimaging datasets across institutions often enables
improvements in statistical power when evaluating associations (e.g., between
risk factors and disease outcomes) that may otherwise be too weak to detect.
When there is only a {\em single} source of variability (e.g., different
scanners), domain adaptation and matching the distributions of representations
may suffice in many scenarios. But in the presence of {\em more than one}
nuisance variable which concurrently influence the measurements, pooling
datasets poses unique challenges, e.g., variations in the data can come from
both the acquisition method as well as the demographics of participants
(gender, age). Invariant representation learning, by itself, is ill-suited to
fully model the data generation process. In this paper, we show how bringing
recent results on equivariant representation learning (for studying symmetries
in neural networks) instantiated on structured spaces together with simple use
of classical results on causal inference provides an effective practical
solution. In particular, we demonstrate how our model allows dealing with more
than one nuisance variable under some assumptions and can enable analysis of
pooled scientific datasets in scenarios that would otherwise entail removing a
large portion of the samples.
Related papers
- Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB)
OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications.
OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z) - Pooling Image Datasets With Multiple Covariate Shift and Imbalance [22.53402104452306]
We show how viewing this problem from the perspective of Category theory provides a simple and effective solution.
We show the effectiveness of this approach via extensive experiments on real datasets.
arXiv Detail & Related papers (2024-03-05T02:20:33Z) - Conditional Generative Models are Sufficient to Sample from Any Causal Effect Estimand [9.460857822923842]
Causal inference from observational data plays critical role in many applications in trustworthy machine learning.
We show how to sample from any identifiable interventional distribution given an arbitrary causal graph.
We also generate high-dimensional interventional samples from the MIMIC-CXR dataset involving text and image variables.
arXiv Detail & Related papers (2024-02-12T05:48:31Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Even Small Correlation and Diversity Shifts Pose Dataset-Bias Issues [19.4921353136871]
We study two types of distribution shifts: diversity shifts, which occur when test samples exhibit patterns unseen during training, and correlation shifts, which occur when test data present a different correlation between seen invariant and spurious features.
We propose an integrated protocol to analyze both types of shifts using datasets where they co-exist in a controllable manner.
arXiv Detail & Related papers (2023-05-09T23:40:23Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.