A Practical Index Structure Supporting Fr\'echet Proximity Queries Among
Trajectories
- URL: http://arxiv.org/abs/2005.13773v1
- Date: Thu, 28 May 2020 04:12:43 GMT
- Title: A Practical Index Structure Supporting Fr\'echet Proximity Queries Among
Trajectories
- Authors: Joachim Gudmundsson, Michael Horton, John Pfeifer, Martin P. Seybold
- Abstract summary: We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics.
Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories.
We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
- Score: 1.9335262420787858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a scalable approach for range and $k$ nearest neighbor queries
under computationally expensive metrics, like the continuous Fr\'echet distance
on trajectory data. Based on clustering for metric indexes, we obtain a dynamic
tree structure whose size is linear in the number of trajectories, regardless
of the trajectory's individual sizes or the spatial dimension, which allows one
to exploit low `intrinsic dimensionality' of data sets for effective search
space pruning.
Since the distance computation is expensive, generic metric indexing methods
are rendered impractical. We present strategies that (i) improve on known upper
and lower bound computations, (ii) build cluster trees without any or very few
distance calls, and (iii) search using bounds for metric pruning, interval
orderings for reduction, and randomized pivoting for reporting the final
results.
We analyze the efficiency and effectiveness of our methods with extensive
experiments on diverse synthetic and real-world data sets. The results show
improvement over state-of-the-art methods for exact queries, and even further
speed-ups are achieved for queries that may return approximate results.
Surprisingly, the majority of exact nearest-neighbor queries on real data sets
are answered without any distance computations.
Related papers
- A Bi-metric Framework for Fast Similarity Search [23.254885582600775]
We propose a new "bi-metric" framework for designing nearest neighbor data structures.
Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate.
We show how to construct data structures using only the proxy metric, while only using a limited number of calls to both metrics.
arXiv Detail & Related papers (2024-06-05T03:17:48Z) - Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection [2.3814052021083354]
This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem.
Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search.
We show that, using softmax-based features, our method achieves a more than ten-fold speed-up over exhaustive search with no loss of accuracy.
arXiv Detail & Related papers (2023-11-05T06:12:03Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Shapley-NAS: Discovering Operation Contribution for Neural Architecture
Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.
We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z) - Efficient Wind Speed Nowcasting with GPU-Accelerated Nearest Neighbors
Algorithm [0.0]
This paper proposes a simple yet efficient high-altitude wind nowcasting pipeline.
It processes efficiently a vast amount of live data recorded by airplanes over the whole airspace.
It creates a unique context for each point in the dataset and then extrapolates from it.
arXiv Detail & Related papers (2021-12-20T09:15:27Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - Metric Learning for Ordered Labeled Trees with pq-grams [11.284638114256712]
We propose a new metric learning approach for tree-structured data with pq-grams.
The pq-gram distance is a distance for ordered labeled trees, and has much lower computation cost than the tree edit distance.
We empirically show that the proposed approach achieves competitive results with the state-of-the-art edit distance-based methods.
arXiv Detail & Related papers (2020-03-09T08:04:47Z) - Scalable Distributed Approximation of Internal Measures for Clustering
Evaluation [5.144809478361603]
Internal measure for clustering evaluation is the silhouette coefficient, whose computation requires a quadratic number of distance calculations.
We present the first scalable algorithm to compute such a rigorous approximation for the evaluation of clusterings based on any metric distances.
We also prove that the algorithm can be adapted to obtain rigorous approximations of other internal measures of clustering quality, such as cohesion and separation.
arXiv Detail & Related papers (2020-03-03T10:28:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.