Flow-based Algorithms for Improving Clusters: A Unifying Framework,
Software, and Performance
- URL: http://arxiv.org/abs/2004.09608v3
- Date: Wed, 2 Feb 2022 14:15:46 GMT
- Title: Flow-based Algorithms for Improving Clusters: A Unifying Framework,
Software, and Performance
- Authors: K. Fountoulakis, M. Liu, D. F. Gleich, and M. W. Mahoney
- Abstract summary: Clustering points in a vector space or nodes in a graph is a ubiquitous primitive in statistical data analysis.
We focus on principled algorithms for this cluster improvement problem.
We develop efficient implementations of these algorithms in our LocalGraphClustering Python package.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering points in a vector space or nodes in a graph is a ubiquitous
primitive in statistical data analysis, and it is commonly used for exploratory
data analysis. In practice, it is often of interest to "refine" or "improve" a
given cluster that has been obtained by some other method. In this survey, we
focus on principled algorithms for this cluster improvement problem. Many such
cluster improvement algorithms are flow-based methods, by which we mean that
operationally they require the solution of a sequence of maximum flow problems
on a (typically implicitly) modified data graph. These cluster improvement
algorithms are powerful, both in theory and in practice, but they have not been
widely adopted for problems such as community detection, local graph
clustering, semi-supervised learning, etc. Possible reasons for this are: the
steep learning curve for these algorithms; the lack of efficient and easy to
use software; and the lack of detailed numerical experiments on real-world data
that demonstrate their usefulness. Our objective here is to address these
issues. To do so, we guide the reader through the whole process of
understanding how to implement and apply these powerful algorithms. We present
a unifying fractional programming optimization framework that permits us to
distill, in a simple way, the crucial components of all these algorithms. It
also makes apparent similarities and differences between related methods.
Viewing these cluster improvement algorithms via a fractional programming
framework suggests directions for future algorithm development. Finally, we
develop efficient implementations of these algorithms in our
LocalGraphClustering Python package, and we perform extensive numerical
experiments to demonstrate the performance of these methods on social networks
and image-based data graphs.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Improved Graph-based semi-supervised learning Schemes [0.0]
In this work, we improve the accuracy of several known algorithms to address the classification of large datasets when few labels are available.
Our framework lies in the realm of graph-based semi-supervised learning.
arXiv Detail & Related papers (2024-06-30T16:50:08Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Influence of Swarm Intelligence in Data Clustering Mechanisms [0.0]
Nature inspired Swarm based algorithms are used for data clustering to cope with larger datasets with lack and inconsistency of data.
This paper reviews the performances of these new approaches and compares which is best for certain problematic situation.
arXiv Detail & Related papers (2023-05-07T08:40:50Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - A Framework and Benchmarking Study for Counterfactual Generating Methods
on Tabular Data [0.0]
Counterfactual explanations are viewed as an effective way to explain machine learning predictions.
There are already dozens of algorithms aiming to generate such explanations.
benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context.
arXiv Detail & Related papers (2021-07-09T21:06:03Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type.
The algorithm is capable of detecting outliers and clusters of relatively lower density values.
We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.