A distribution-guided Mapper algorithm
- URL: http://arxiv.org/abs/2401.12237v2
- Date: Wed, 14 Aug 2024 08:37:37 GMT
- Title: A distribution-guided Mapper algorithm
- Authors: Yuyang Tao, Shufei Ge,
- Abstract summary: We introduce a distribution guided Mapper algorithm named D-Mapper.
Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones.
Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
- Score: 0.3683202928838613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: The Mapper algorithm is an essential tool to explore shape of data in topology data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a reeb graph of data. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle features of data, especially when the underlying structure is complex. Results: In this work, we introduce a distribution guided Mapper algorithm named D-Mapper, that utilizes the property of the probability model and data intrinsic characteristics to generate density guided covers and provides enhanced topological features. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Moreover, we introduce a metric accounting for both the quality of overlap clustering and extended persistence homology to measure the performance of Mapper type algorithm. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios. We also apply the D-Mapper to a SARS-COV-2 coronavirus RNA sequences dataset to explore the topological structure of different virus variants. The results indicate that the D-Mapper algorithm can reveal both vertical and horizontal evolution processes of the viruses. Availability: Our package is available at https://github.com/ShufeiGe/D-Mapper.
Related papers
- Differentiable Mapper For Topological Optimization Of Data
Representation [33.33724208084121]
We build on a recently proposed framework incorporating topology to provide the first filter optimization scheme for Mapper graphs.
We demonstrate the usefulness of our approach by optimizing Mapper graph representations on several datasets.
arXiv Detail & Related papers (2024-02-20T09:33:22Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - $G$-Mapper: Learning a Cover in the Mapper Construction [0.7852714805965528]
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset.
We present an algorithm that optimize the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality.
Our algorithm is based on $G$-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test.
arXiv Detail & Related papers (2023-09-12T22:51:16Z) - Asynchronously Trained Distributed Topographic Maps [0.0]
We present an algorithm that uses $N$ autonomous units to generate a feature map by distributed training.
Unit autonomy is achieved by sparse interaction in time & space through the combination of a distributed search, and a cascade-driven weight updating scheme.
arXiv Detail & Related papers (2023-01-20T01:15:56Z) - Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph
Neural Networks [54.225220638606814]
We propose a pseudometric for attributed graphs, the Tree Mover's Distance (TMD), and study its relation to generalization.
First, we show that TMD captures properties relevant to graph classification; a simple TMD-SVM performs competitively with standard GNNs.
Second, we relate TMD to generalization of GNNs under distribution shifts, and show that it correlates well with performance drop under such shifts.
arXiv Detail & Related papers (2022-10-04T21:03:52Z) - Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data.
Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling.
We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z) - Comprehensive Graph-conditional Similarity Preserving Network for
Unsupervised Cross-modal Hashing [97.44152794234405]
Unsupervised cross-modal hashing (UCMH) has become a hot topic recently.
In this paper, we devise a deep graph-neighbor coherence preserving network (DGCPN)
DGCPN regulates comprehensive similarity preserving losses by exploiting three types of data similarities.
arXiv Detail & Related papers (2020-12-25T07:40:59Z) - Hotspot identification for Mapper graphs [2.330913682033217]
Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters.
In many applications, such as precision medicine, Mapper graph has been used to identify unknown compactly localized subareas.
We propose a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing of the hotspot detection process.
arXiv Detail & Related papers (2020-12-03T12:22:25Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z) - ShapeVis: High-dimensional Data Visualization at Scale [10.007129417823858]
We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis.
Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation.
arXiv Detail & Related papers (2020-01-15T07:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.