Sparse Convex Biclustering
- URL: http://arxiv.org/abs/2601.01757v1
- Date: Mon, 05 Jan 2026 03:15:52 GMT
- Title: Sparse Convex Biclustering
- Authors: Jiakun Jiang, Dewei Xiang, Chenliang Gu, Wei Liu, Binhuan Wang,
- Abstract summary: Biclustering robustness is an essential machine learning technique for simultaneously clustering rows and columns of a data matrix.<n>We propose a novel method that penalizes noise during the biclustering process to improve both accuracy and stability.
- Score: 3.067019303674385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biclustering is an essential unsupervised machine learning technique for simultaneously clustering rows and columns of a data matrix, with widespread applications in genomics, transcriptomics, and other high-dimensional omics data. Despite its importance, existing biclustering methods struggle to meet the demands of modern large-scale datasets. The challenges stem from the accumulation of noise in high-dimensional features, the limitations of non-convex optimization formulations, and the computational complexity of identifying meaningful biclusters. These issues often result in reduced accuracy and stability as the size of the dataset increases. To overcome these challenges, we propose Sparse Convex Biclustering (SpaCoBi), a novel method that penalizes noise during the biclustering process to improve both accuracy and robustness. By adopting a convex optimization framework and introducing a stability-based tuning criterion, SpaCoBi achieves an optimal balance between cluster fidelity and sparsity. Comprehensive numerical studies, including simulations and an application to mouse olfactory bulb data, demonstrate that SpaCoBi significantly outperforms state-of-the-art methods in accuracy. These results highlight SpaCoBi as a robust and efficient solution for biclustering in high-dimensional and large-scale datasets.
Related papers
- Convex Clustering Redefined: Robust Learning with the Median of Means Estimator [22.614296433333106]
We introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator.<n>Our method enhances both performance and efficiency, especially on large-scale datasets.
arXiv Detail & Related papers (2025-11-12T21:16:53Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - Exact and Heuristic Algorithms for Constrained Biclustering [0.0]
Biclustering, also known as co-clustering or two-way clustering, simultaneously partitions the rows and columns of a data matrix to reveal submatrices with coherent patterns.<n>We study constrained biclustering with pairwise constraints, namely must-link and cannot-link constraints, which specify whether objects should belong to the same or different biclusters.
arXiv Detail & Related papers (2025-08-07T15:29:22Z) - Geometric Median Matching for Robust k-Subset Selection from Noisy Data [75.86423267723728]
We propose a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2.<n>Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption.
arXiv Detail & Related papers (2025-04-01T09:22:05Z) - Strong bounds for large-scale Minimum Sum-of-Squares Clustering [0.45880283710344055]
Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used clustering methods.<n>MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids.<n>We introduce a novel method to validate MSSC solutions through optimality gaps.
arXiv Detail & Related papers (2025-02-12T13:40:00Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Toward Efficient and Incremental Spectral Clustering via Parametric
Spectral Clustering [2.44755919161855]
Spectral clustering is a popular method for effectively clustering nonlinearly separable data.
This paper introduces a novel approach called parametric spectral clustering (PSC)
PSC addresses the challenges associated with big data and real-time scenarios.
arXiv Detail & Related papers (2023-11-14T01:26:20Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Fast and Interpretable Consensus Clustering via Minipatch Learning [0.0]
We develop IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering.
We develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings.
Results show that our approach yields more accurate and interpretable cluster solutions.
arXiv Detail & Related papers (2021-10-05T22:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.