Extracting the main trend in a dataset: the Sequencer algorithm
- URL: http://arxiv.org/abs/2006.13948v1
- Date: Wed, 24 Jun 2020 18:00:03 GMT
- Title: Extracting the main trend in a dataset: the Sequencer algorithm
- Authors: Dalya Baron and Brice M\'enard
- Abstract summary: One-dimensional trends, often called sequences, provide insights into simple phenomena.
We present the Sequencer, an algorithm designed to generically identify the main trend in a dataset.
We show that, in a number of cases, it outperforms the popular t-SNE and UMAP dimensionality reduction techniques.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientists aim to extract simplicity from observations of the complex world.
An important component of this process is the exploration of data in search of
trends. In practice, however, this tends to be more of an art than a science.
Among all trends existing in the natural world, one-dimensional trends, often
called sequences, are of particular interest as they provide insights into
simple phenomena. However, some are challenging to detect as they may be
expressed in complex manners. We present the Sequencer, an algorithm designed
to generically identify the main trend in a dataset. It does so by constructing
graphs describing the similarities between pairs of observations, computed with
a set of metrics and scales. Using the fact that continuous trends lead to more
elongated graphs, the algorithm can identify which aspects of the data are
relevant in establishing a global sequence. Such an approach can be used beyond
the proposed algorithm and can optimize the parameters of any dimensionality
reduction technique. We demonstrate the power of the Sequencer using real-world
data from astronomy, geology as well as images from the natural world. We show
that, in a number of cases, it outperforms the popular t-SNE and UMAP
dimensionality reduction techniques. This approach to exploratory data
analysis, which does not rely on training nor tuning of any parameter, has the
potential to enable discoveries in a wide range of scientific domains. The
source code is available on github and we provide an online interface at
\url{http://sequencer.org}.
Related papers
- Range-aware Positional Encoding via High-order Pretraining: Theory and Practice [14.521929085104441]
Unsupervised pre-training on vast amounts of graph data is critical in real-world applications wherein labeled data is limited.
We propose a novel pre-training strategy on graphs that focuses on modeling their multi-resolution structural information.
Our approach relies solely on the graph structure, it is also domain-agnostic and adaptable to datasets from various domains.
arXiv Detail & Related papers (2024-09-27T19:53:10Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Scoring Anomalous Vertices Through Quantum Walks [0.26013878609420266]
For unlabeled data, anomaly detection on graphs is a method to determine which data points do not posses the latent characteristics that is present in most other data.
We propose a first quantum algorithm to calculate the anomaly score of each node by continuously traversing the graph with a uniform starting position of all nodes.
arXiv Detail & Related papers (2023-11-16T12:32:13Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Learning Graph Search Heuristics [48.83557172525969]
We present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigations from data.
Our function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time.
Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5% on average.
arXiv Detail & Related papers (2022-12-07T22:28:00Z) - Expander Graph Propagation [0.0]
We propose an elegant approach based on propagating information over expander graphs.
We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up.
We believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs.
arXiv Detail & Related papers (2022-10-06T15:36:37Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - Generating a Doppelganger Graph: Resembling but Distinct [5.618335078130568]
We propose an approach to generating a doppelganger graph that resembles a given one in many graph properties.
The approach is an orchestration of graph representation learning, generative adversarial networks, and graph realization algorithms.
arXiv Detail & Related papers (2021-01-23T22:08:27Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Neural Architecture Search in Graph Neural Networks [1.2881413375147996]
This paper compares two NAS methods for optimizing Graph Neural Networks (GNN)
Results consider 7 datasets over two search spaces and show that both methods obtain similar accuracies to a random search.
arXiv Detail & Related papers (2020-07-31T21:04:24Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.