Sanitized Clustering against Confounding Bias
- URL: http://arxiv.org/abs/2311.01252v1
- Date: Thu, 2 Nov 2023 14:10:14 GMT
- Title: Sanitized Clustering against Confounding Bias
- Authors: Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang, Xin Yao
- Abstract summary: This paper presents a new clustering framework named Sanitized Clustering Against confounding Bias (SCAB)
SCAB removes the confounding factor in the semantic latent space of complex data through a non-linear dependence measure.
Experiments on complex datasets demonstrate that our SCAB achieves a significant gain in clustering performance.
- Score: 38.928080236294775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world datasets inevitably contain biases that arise from different
sources or conditions during data collection. Consequently, such inconsistency
itself acts as a confounding factor that disturbs the cluster analysis.
Existing methods eliminate the biases by projecting data onto the orthogonal
complement of the subspace expanded by the confounding factor before
clustering. Therein, the interested clustering factor and the confounding
factor are coarsely considered in the raw feature space, where the correlation
between the data and the confounding factor is ideally assumed to be linear for
convenient solutions. These approaches are thus limited in scope as the data in
real applications is usually complex and non-linearly correlated with the
confounding factor. This paper presents a new clustering framework named
Sanitized Clustering Against confounding Bias (SCAB), which removes the
confounding factor in the semantic latent space of complex data through a
non-linear dependence measure. To be specific, we eliminate the bias
information in the latent space by minimizing the mutual information between
the confounding factor and the latent representation delivered by Variational
Auto-Encoder (VAE). Meanwhile, a clustering module is introduced to cluster
over the purified latent representations. Extensive experiments on complex
datasets demonstrate that our SCAB achieves a significant gain in clustering
performance by removing the confounding bias. The code is available at
\url{https://github.com/EvaFlower/SCAB}.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Inv-SENnet: Invariant Self Expression Network for clustering under
biased data [17.25929452126843]
We propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces.
Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-11-13T01:19:06Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Decorrelated Clustering with Data Selection Bias [55.91842043124102]
We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias.
Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
arXiv Detail & Related papers (2020-06-29T08:55:50Z) - Robust Self-Supervised Convolutional Neural Network for Subspace
Clustering and Classification [0.10152838128195464]
This paper proposes the robust formulation of the self-supervised convolutional subspace clustering network ($S2$ConvSCN)
In a truly unsupervised training environment, Robust $S2$ConvSCN outperforms its baseline version by a significant amount for both seen and unseen data on four well-known datasets.
arXiv Detail & Related papers (2020-04-03T16:07:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.