Refining Filter Global Feature Weighting for Fully-Unsupervised Clustering
- URL: http://arxiv.org/abs/2503.11706v1
- Date: Wed, 12 Mar 2025 13:14:09 GMT
- Title: Refining Filter Global Feature Weighting for Fully-Unsupervised Clustering
- Authors: Fabian Galis, Darian Onchis,
- Abstract summary: In unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data.<n>This paper explores feature weighting for clustering and presents new weighting strategies, including methods based on SHAP (SHapley Additive exPlanations)<n>Our empirical evaluations demonstrate that feature weighting based on SHAP can enhance unsupervised clustering quality, achieving up to a 22.69% improvement over other weighting methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of unsupervised learning, effective clustering plays a vital role in revealing patterns and insights from unlabeled data. However, the success of clustering algorithms often depends on the relevance and contribution of features, which can differ between various datasets. This paper explores feature weighting for clustering and presents new weighting strategies, including methods based on SHAP (SHapley Additive exPlanations), a technique commonly used for providing explainability in various supervised machine learning tasks. By taking advantage of SHAP values in a way other than just to gain explainability, we use them to weight features and ultimately improve the clustering process itself in unsupervised scenarios. Our empirical evaluations across five benchmark datasets and clustering methods demonstrate that feature weighting based on SHAP can enhance unsupervised clustering quality, achieving up to a 22.69\% improvement over other weighting methods (from 0.586 to 0.719 in terms of the Adjusted Rand Index). Additionally, these situations where the weighted data boosts the results are highlighted and thoroughly explored, offering insight for practical applications.
Related papers
- Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number [12.926206811876174]
We introduce a novel self-supervised deep clustering approach tailored for unstructured data, termed Adaptive Self-supervised Robust Clustering (ASRC)
ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information.
ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data.
arXiv Detail & Related papers (2024-07-29T15:51:09Z) - End-to-end Learnable Clustering for Intent Learning in Recommendation [54.157784572994316]
We propose a novel intent learning method termed underlineELCRec.
It unifies behavior representation learning into an underlineEnd-to-end underlineLearnable underlineClustering framework.
We deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
arXiv Detail & Related papers (2024-01-11T15:22:55Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Mitigating shortage of labeled data using clustering-based active
learning with diversity exploration [3.312798619476657]
We propose a clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling.
A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes.
arXiv Detail & Related papers (2022-07-06T20:53:28Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Variable feature weighted fuzzy k-means algorithm for high dimensional data [30.828627752648767]
In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters.
This paper presents a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms.
The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets.
arXiv Detail & Related papers (2019-12-24T04:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.