Clustering with Tangles: Algorithmic Framework and Theoretical
Guarantees
- URL: http://arxiv.org/abs/2006.14444v3
- Date: Sun, 6 Nov 2022 22:11:38 GMT
- Title: Clustering with Tangles: Algorithmic Framework and Theoretical
Guarantees
- Authors: Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip,
Luca Rendsburg, Maximilian Teegen, Ulrike von Luxburg
- Abstract summary: In this paper, we showcase the practical potential of tangles in machine learning applications.
Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure.
We construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases.
- Score: 10.992467680364962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Originally, tangles were invented as an abstract tool in mathematical graph
theory to prove the famous graph minor theorem. In this paper, we showcase the
practical potential of tangles in machine learning applications. Given a
collection of cuts of any dataset, tangles aggregate these cuts to point in the
direction of a dense structure. As a result, a cluster is softly characterized
by a set of consistent pointers. This highly flexible approach can solve
clustering problems in various setups, ranging from questionnaires over
community detection in graphs to clustering points in metric spaces. The output
of our proposed framework is hierarchical and induces the notion of a soft
dendrogram, which can help explore the cluster structure of a dataset. The
computational complexity of aggregating the cuts is linear in the number of
data points. Thus the bottleneck of the tangle approach is to generate the
cuts, for which simple and fast algorithms form a sufficient basis. In our
paper we construct the algorithmic framework for clustering with tangles, prove
theoretical guarantees in various settings, and provide extensive simulations
and use cases. Python code is available on github.
Related papers
- Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - ClusterFuG: Clustering Fully connected Graphs by Multicut [20.254912065749956]
In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors.
We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality.
arXiv Detail & Related papers (2023-01-28T11:10:50Z) - flow-based clustering and spectral clustering: a comparison [0.688204255655161]
We study a novel graph clustering method for data with an intrinsic network structure.
We exploit an intrinsic network structure of data to construct Euclidean feature vectors.
Our results indicate that our clustering methods can cope with certain graph structures.
arXiv Detail & Related papers (2022-06-20T21:49:52Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Graph Neural Networks with Composite Kernels [60.81504431653264]
We re-interpret node aggregation from the perspective of kernel weighting.
We present a framework to consider feature similarity in an aggregation scheme.
We propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
arXiv Detail & Related papers (2020-05-16T04:44:29Z) - Clustering by Constructing Hyper-Planes [0.0]
We present a clustering algorithm by finding hyper-planes to distinguish data points.
It relies on the marginal space between the points to determine centers and numbers of clusters.
Because the algorithm is based on linear structures, it can approximate the distribution of datasets accurately and flexibly.
arXiv Detail & Related papers (2020-04-25T08:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.