$G$-Mapper: Learning a Cover in the Mapper Construction
- URL: http://arxiv.org/abs/2309.06634v2
- Date: Mon, 4 Mar 2024 12:08:53 GMT
- Title: $G$-Mapper: Learning a Cover in the Mapper Construction
- Authors: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh
Palande, Sarah Percival, Emilie Purvine
- Abstract summary: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset.
We present an algorithm that optimize the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality.
Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test.
- Score: 0.7852714805965528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Mapper algorithm is a visualization technique in topological data
analysis (TDA) that outputs a graph reflecting the structure of a given
dataset. However, the Mapper algorithm requires tuning several parameters in
order to generate a ``nice" Mapper graph. This paper focuses on selecting the
cover parameter. We present an algorithm that optimizes the cover of a Mapper
graph by splitting a cover repeatedly according to a statistical test for
normality. Our algorithm is based on $G$-means clustering which searches for
the optimal number of clusters in $k$-means by iteratively applying the
Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model
to carefully choose the cover according to the distribution of the given data.
Experiments for synthetic and real-world datasets demonstrate that our
algorithm generates covers so that the Mapper graphs retain the essence of the
datasets, while also running significantly fast.
Related papers
- A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC.
It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters.
GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z) - A Mapper Algorithm with implicit intervals and its optimization [0.3683202928838613]
The Mapper algorithm is an essential tool for visualizing complex, high dimensional data in topology data analysis.
The need for manual parameter tuning and fixed intervals, along with fixed overlapping ratios may impede the performance of the standard Mapper algorithm.
We introduce a novel framework that implicitly represents intervals through a hidden assignment matrix.
arXiv Detail & Related papers (2024-12-16T10:16:54Z) - Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy [0.0]
We propose a pair of data-driven algorithms for unsupervised classification and dimension reduction.
In our experiments, our clustering algorithm outperforms $k$-means clustering on data sets with non-trivial geometry and topology.
arXiv Detail & Related papers (2024-11-29T18:04:11Z) - A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs.
Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z) - Differentiable Mapper For Topological Optimization Of Data
Representation [33.33724208084121]
We build on a recently proposed framework incorporating topology to provide the first filter optimization scheme for Mapper graphs.
We demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets.
arXiv Detail & Related papers (2024-02-20T09:33:22Z) - A distribution-guided Mapper algorithm [0.3683202928838613]
We introduce a distribution guided Mapper algorithm named D-Mapper.
Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones.
Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
arXiv Detail & Related papers (2024-01-19T17:07:05Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Reinforcement Learning Based Query Vertex Ordering Model for Subgraph
Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G.
matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms.
In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area.
Recent studies about graph convolution neural networks have achieved impressive success on graph type data.
We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.