Related papers: Scalable unsupervised feature selection via weight stability

Scalable unsupervised feature selection via weight stability

URL: http://arxiv.org/abs/2506.06114v3
Date: Fri, 13 Jun 2025 14:11:37 GMT
Title: Scalable unsupervised feature selection via weight stability
Authors: Xudong Zhang, Renato Cordeiro de Amorim,
Abstract summary: Unsampling feature selection is critical for improving clustering performance in high-dimensional data.<n>We introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski weighted $k$-means.<n>We propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents.
Score: 4.839191514733483
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski Weighted $k$-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical guarantee under mild assumptions and extensive experiments showing that our methods consistently outperform existing alternatives. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.

Related papers

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces [0.7864304771129751]
We introduce the first k-means variant in the literature that operates within a probabilistic metric space.<n>By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework.<n>Our proposed random normed k-means (RNKM) algorithm exhibits a remarkable ability to identify nonlinearly separable structures.
arXiv Detail & Related papers (2025-04-04T20:48:43Z)
Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization [6.342485512772862]
Unsupervised single feature selection (UFS) is widely applied in machine learning and pattern recognition.<n>Most of the existing methods only consider sparsity, which makes it difficult to select subsets and discriminative from the original.<n>In this paper, we propose a new method called DSCOFS to consider a subset and discriminative from the original.
arXiv Detail & Related papers (2025-01-01T05:05:46Z)
K-means Derived Unsupervised Feature Selection using Improved ADMM [25.145984747164256]
This paper presents a novel method called K-means Derived Unsupervised Feature Selection (K-means UFS) Unlike most existing spectral analysis based unsupervised feature selection methods, we select features using the objective of K-means. Experiments on real datasets show that our K-means UFS is more effective than the baselines in selecting features for clustering.
arXiv Detail & Related papers (2024-11-19T18:05:02Z)
Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport [84.51977664336056]
We propose a novel scalable approach for estimating the robust continuous barycenter.<n>Our method is framed as a min-max optimization problem and is adaptable to general cost functions.
arXiv Detail & Related papers (2024-10-04T23:27:33Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS) We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z)
Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z)
Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data. In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions. Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors. We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z)
Stable and consistent density-based clustering via multiparameter persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance. We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.