Related papers: $G$-Mapper: Learning a Cover in the Mapper Construction

$G$-Mapper: Learning a Cover in the Mapper Construction

URL: http://arxiv.org/abs/2309.06634v2
Date: Mon, 4 Mar 2024 12:08:53 GMT
Title: $G$-Mapper: Learning a Cover in the Mapper Construction
Authors: Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh Palande, Sarah Percival, Emilie Purvine
Abstract summary: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. We present an algorithm that optimize the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test.
Score: 0.7852714805965528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly fast.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
A Mapper Algorithm with implicit intervals and its optimization [0.3683202928838613]
The Mapper algorithm is an essential tool for visualizing complex, high dimensional data in topology data analysis. The need for manual parameter tuning and fixed intervals, along with fixed overlapping ratios may impede the performance of the standard Mapper algorithm. We introduce a novel framework that implicitly represents intervals through a hidden assignment matrix.
arXiv Detail & Related papers (2024-12-16T10:16:54Z)
Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy [0.0]
We propose a pair of data-driven algorithms for unsupervised classification and dimension reduction. In our experiments, our clustering algorithm outperforms $k$-means clustering on data sets with non-trivial geometry and topology.
arXiv Detail & Related papers (2024-11-29T18:04:11Z)
A Differentially Private Clustering Algorithm for Well-Clustered Graphs [6.523602840064548]
We provide an efficient ($epsilon,$delta$)-DP algorithm tailored specifically for such graphs. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters.
arXiv Detail & Related papers (2024-03-21T11:57:16Z)
Differentiable Mapper For Topological Optimization Of Data Representation [33.33724208084121]
We build on a recently proposed framework incorporating topology to provide the first filter optimization scheme for Mapper graphs. We demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets.
arXiv Detail & Related papers (2024-02-20T09:33:22Z)
A distribution-guided Mapper algorithm [0.3683202928838613]
We introduce a distribution guided Mapper algorithm named D-Mapper. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
arXiv Detail & Related papers (2024-01-19T17:07:05Z)
Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data. The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z)
Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average. Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G. matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Pseudoinverse Graph Convolutional Networks: Fast Filters Tailored for Large Eigengaps of Dense Graphs and Hypergraphs [0.0]
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets. We propose a new GCN variant whose three-part filter space is targeted at dense graphs.
arXiv Detail & Related papers (2020-08-03T08:48:41Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Adaptive Graph Auto-Encoder for General Data Clustering [90.8576971748142]
Graph-based clustering plays an important role in the clustering area. Recent studies about graph convolution neural networks have achieved impressive success on graph type data. We propose a graph auto-encoder for general data clustering, which constructs the graph adaptively according to the generative perspective of graphs.
arXiv Detail & Related papers (2020-02-20T10:11:28Z)
ShapeVis: High-dimensional Data Visualization at Scale [10.007129417823858]
We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation.
arXiv Detail & Related papers (2020-01-15T07:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.