$G$-Mapper: Learning a Cover in the Mapper Construction
- URL: http://arxiv.org/abs/2309.06634v2
- Date: Mon, 4 Mar 2024 12:08:53 GMT
- Title: $G$-Mapper: Learning a Cover in the Mapper Construction
- Authors: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh
Palande, Sarah Percival, Emilie Purvine
- Abstract summary: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset.
We present an algorithm that optimize the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality.
Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test.
- Score: 0.7852714805965528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Mapper algorithm is a visualization technique in topological data
analysis (TDA) that outputs a graph reflecting the structure of a given
dataset. However, the Mapper algorithm requires tuning several parameters in
order to generate a ``nice" Mapper graph. This paper focuses on selecting the
cover parameter. We present an algorithm that optimizes the cover of a Mapper
graph by splitting a cover repeatedly according to a statistical test for
normality. Our algorithm is based on $G$-means clustering which searches for
the optimal number of clusters in $k$-means by iteratively applying the
Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model
to carefully choose the cover according to the distribution of the given data.
Experiments for synthetic and real-world datasets demonstrate that our
algorithm generates covers so that the Mapper graphs retain the essence of the
datasets, while also running significantly fast.
Related papers
- A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs.
Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z) - Differentiable Mapper For Topological Optimization Of Data
Representation [33.33724208084121]
We build on a recently proposed framework incorporating topology to provide the first filter optimization scheme for Mapper graphs.
We demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets.
arXiv Detail & Related papers (2024-02-20T09:33:22Z) - A distribution-guided Mapper algorithm [0.3683202928838613]
We introduce a distribution guided Mapper algorithm named D-Mapper.
Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones.
Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
arXiv Detail & Related papers (2024-01-19T17:07:05Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Reinforcement Learning Based Query Vertex Ordering Model for Subgraph
Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G.
matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms.
In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Pseudoinverse Graph Convolutional Networks: Fast Filters Tailored for
Large Eigengaps of Dense Graphs and Hypergraphs [0.0]
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets.
We propose a new GCN variant whose three-part filter space is targeted at dense graphs.
arXiv Detail & Related papers (2020-08-03T08:48:41Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area.
Recent studies about graph convolution neural networks have achieved impressive success on graph type data.
We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z) - ShapeVis: High-dimensional Data Visualization at Scale [10.007129417823858]
We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis.
Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation.
arXiv Detail & Related papers (2020-01-15T07:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.