Related papers: Extracting the main trend in a dataset: the Sequencer algorithm

Extracting the main trend in a dataset: the Sequencer algorithm

URL: http://arxiv.org/abs/2006.13948v1
Date: Wed, 24 Jun 2020 18:00:03 GMT
Title: Extracting the main trend in a dataset: the Sequencer algorithm
Authors: Dalya Baron and Brice M\'enard
Abstract summary: One-dimensional trends, often called sequences, provide insights into simple phenomena. We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset. We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientists aim to extract simplicity from observations of the complex world. An important component of this process is the exploration of data in search of trends. In practice, however, this tends to be more of an art than a science. Among all trends existing in the natural world, one-dimensional trends, often called sequences, are of particular interest as they provide insights into simple phenomena. However, some are challenging to detect as they may be expressed in complex manners. We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset. It does so by constructing graphs describing the similarities between pairs of observations, computed with a set of metrics and scales. Using the fact that continuous trends lead to more elongated graphs, the algorithm can identify which aspects of the data are relevant in establishing a global sequence. Such an approach can be used beyond the proposed algorithm and can optimize the parameters of any dimensionality reduction technique. We demonstrate the power of the Sequencer using real-world data from astronomy, geology as well as images from the natural world. We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques. This approach to exploratory data analysis, which does not rely on training nor tuning of any parameter, has the potential to enable discoveries in a wide range of scientific domains. The source code is available on github and we provide an online interface at \url{http://sequencer.org}.

Related papers

Range-aware Positional Encoding via High-order Pretraining: Theory and Practice [14.521929085104441]
Unsupervised pre-training on vast amounts of graph data is critical in real-world applications wherein labeled data is limited. We propose a novel pre-training strategy on graphs that focuses on modeling their multi-resolution structural information. Our approach relies solely on the graph structure, it is also domain-agnostic and adaptable to datasets from various domains.
arXiv Detail & Related papers (2024-09-27T19:53:10Z)
Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure. We introduce a new class of manifold, named soft manifold, that can solve this situation. Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z)
Scoring Anomalous Vertices Through Quantum Walks [0.26013878609420266]
For unlabeled data, anomaly detection on graphs is a method to determine which data points do not posses the latent characteristics that is present in most other data. We propose a first quantum algorithm to calculate the anomaly score of each node by continuously traversing the graph with a uniform starting position of all nodes.
arXiv Detail & Related papers (2023-11-16T12:32:13Z)
Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction. We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z)
Learning Graph Search Heuristics [48.83557172525969]
We present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigations from data. Our function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time. Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5% on average.
arXiv Detail & Related papers (2022-12-07T22:28:00Z)
Expander Graph Propagation [0.0]
We propose an elegant approach based on propagating information over expander graphs. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up. We believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs.
arXiv Detail & Related papers (2022-10-06T15:36:37Z)
Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
Generating a Doppelganger Graph: Resembling but Distinct [5.618335078130568]
We propose an approach to generating a doppelganger graph that resembles a given one in many graph properties. The approach is an orchestration of graph representation learning, generative adversarial networks, and graph realization algorithms.
arXiv Detail & Related papers (2021-01-23T22:08:27Z)
Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole. We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z)
Neural Architecture Search in Graph Neural Networks [1.2881413375147996]
This paper compares two NAS methods for optimizing Graph Neural Networks (GNN) Results consider 7 datasets over two search spaces and show that both methods obtain similar accuracies to a random search.
arXiv Detail & Related papers (2020-07-31T21:04:24Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.