Dependent Cluster Mapping (DCMAP): Optimal clustering of directed
acyclic graphs for statistical inference
- URL: http://arxiv.org/abs/2308.03970v3
- Date: Wed, 7 Feb 2024 23:18:38 GMT
- Title: Dependent Cluster Mapping (DCMAP): Optimal clustering of directed
acyclic graphs for statistical inference
- Authors: Paul Pao-Yen Wu, Fabrizio Ruggeri, Kerrie Mengersen
- Abstract summary: A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to support inference.
We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters.
For a 25 and 50-node DBN, the search space size was $9.91times 109$ and $1.51times1021$ possible cluster mappings, and the first optimal solution was found at 934 $(text95% CI 926,971)$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A Directed Acyclic Graph (DAG) can be partitioned or mapped into clusters to
support and make inference more computationally efficient in Bayesian Network
(BN), Markov process and other models. However, optimal partitioning with an
arbitrary cost function is challenging, especially in statistical inference as
the local cluster cost is dependent on both nodes within a cluster, and the
mapping of clusters connected via parent and/or child nodes, which we call
dependent clusters. We propose a novel algorithm called DCMAP for optimal
cluster mapping with dependent clusters. Given an arbitrarily defined, positive
cost function based on the DAG, we show that DCMAP converges to find all
optimal clusters, and returns near-optimal solutions along the way.
Empirically, we find that the algorithm is time-efficient for a Dynamic BN
(DBN) model of a seagrass complex system using a computation cost function. For
a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and
$1.51\times10^{21}$ possible cluster mappings, and the first optimal solution
was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$
with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.
Related papers
- A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - Refining a $k$-nearest neighbor graph for a computationally efficient
spectral clustering [1.5469452301122175]
approximate spectral clustering (ASC) uses sampling or quantization to select data representatives.
We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges.
Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
arXiv Detail & Related papers (2023-02-22T11:31:32Z) - Hybridization of K-means with improved firefly algorithm for automatic
clustering in high dimension [0.0]
We have implemented the Silhouette and Elbow methods with PCA to find an optimal number of clusters.
In the Firefly algorithm, the entire population is automatically subdivided into sub-populations that decrease the convergence rate speed and trapping to local minima.
Our study proposed an enhanced firefly, i.e., a hybridized K-means with an ODFA model for automatic clustering.
arXiv Detail & Related papers (2023-02-09T18:43:10Z) - Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data.
By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.