Related papers: A Generalized Framework for Predictive Clustering and Optimization

A Generalized Framework for Predictive Clustering and Optimization

URL: http://arxiv.org/abs/2305.04364v1
Date: Sun, 7 May 2023 19:56:51 GMT
Title: A Generalized Framework for Predictive Clustering and Optimization
Authors: Aravinth Chembu, Scott Sanner
Abstract summary: Clustering is a powerful and extensively used data science tool. In this article, we define a generalized optimization framework for predictive clustering. We also present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization.
Score: 18.06697544912383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to find clusters of data that yield low regression error on a supervised target. We believe that clusterwise regression is just a single vertex of a largely unexplored design space of supervised clustering models. In this article, we define a generalized optimization framework for predictive clustering that admits different cluster definitions (arbitrary point assignment, closest center, and bounding box) and both regression and classification objectives. We then present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization in this generalized framework. To alleviate scalability concerns for large datasets, we also provide highly scalable greedy algorithms inspired by the Majorization-Minimization (MM) framework. Finally, we demonstrate the ability of our models to uncover different interpretable discrete cluster structures in data by experimenting with four real-world datasets.

Related papers

Graph Probability Aggregation Clustering [5.377020739388736]
We propose a graph-based fuzzy clustering algorithm that unifies the global clustering objective function with a local clustering constraint. The entire GPAC framework is formulated as a multi-constrained optimization problem, which can be solved using the Lagrangian method. Experiments conducted on synthetic, real-world, and deep learning datasets demonstrate that GPAC not only exceeds existing state-of-the-art methods in clustering performance but also excels in computational efficiency.
arXiv Detail & Related papers (2025-02-27T09:11:32Z)
AdaptiveMDL-GenClust: A Robust Clustering Framework Integrating Normalized Mutual Information and Evolutionary Algorithms [0.0]
We introduce a robust clustering framework that integrates the Minimum Description Length (MDL) principle with a genetic optimization algorithm. The framework begins with an ensemble clustering approach to generate an initial clustering solution, which is refined using MDL-guided evaluation functions and optimized through a genetic algorithm. Experimental results demonstrate that our approach consistently outperforms traditional clustering methods, yielding higher accuracy, improved stability, and reduced bias.
arXiv Detail & Related papers (2024-11-26T20:26:14Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z)
Transferable Deep Clustering Model [14.073783373395196]
We propose a novel transferable deep clustering model that can automatically adapt the cluster centroids according to the distribution of data samples. Our approach introduces a novel attention-based module that can adapt the centroids by measuring their relationship with samples. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of our proposed transfer learning framework.
arXiv Detail & Related papers (2023-10-07T23:35:17Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Multi-objective Semi-supervised Clustering for Finding Predictive Clusters [0.5371337604556311]
This study focuses on clustering problems and aims to find compact clusters that are informative regarding the outcome variable. The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously.
arXiv Detail & Related papers (2022-01-26T06:24:38Z)
Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z)
Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data. We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.