Learned spatial data partitioning
- URL: http://arxiv.org/abs/2306.04846v2
- Date: Mon, 19 Jun 2023 04:15:12 GMT
- Title: Learned spatial data partitioning
- Authors: Keizo Hori, Yuya Sasaki, Daichi Amagata, Yuki Murosaki, Makoto Onizuka
- Abstract summary: We first study learned spatial data partitioning, which effectively assigns groups of big spatial data to computers based on locations of data.
We formalize spatial data partitioning in the context of reinforcement learning and develop a novel deep reinforcement learning algorithm.
Our method efficiently finds partitions for accelerating distance join queries and reduces the workload run time by up to 59.4%.
- Score: 7.342228103959199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the significant increase in the size of spatial data, it is essential
to use distributed parallel processing systems to efficiently analyze spatial
data. In this paper, we first study learned spatial data partitioning, which
effectively assigns groups of big spatial data to computers based on locations
of data by using machine learning techniques. We formalize spatial data
partitioning in the context of reinforcement learning and develop a novel deep
reinforcement learning algorithm. Our learning algorithm leverages features of
spatial data partitioning and prunes ineffective learning processes to find
optimal partitions efficiently. Our experimental study, which uses Apache
Sedona and real-world spatial data, demonstrates that our method efficiently
finds partitions for accelerating distance join queries and reduces the
workload run time by up to 59.4%.
Related papers
- Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures [0.6990493129893112]
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data.
Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization.
We propose three innovatives designed to quantify and strategically reduce data space partition overlap.
arXiv Detail & Related papers (2024-08-28T16:16:55Z) - Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning.
We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z) - TrueDeep: A systematic approach of crack detection with less data [0.0]
We show that by incorporating domain knowledge along with deep learning architectures, we can achieve similar performance with less data.
Our algorithms, developed with 23% of the overall data, have a similar performance on the test data and significantly better performance on multiple blind datasets.
arXiv Detail & Related papers (2023-05-30T14:51:58Z) - Block size estimation for data partitioning in HPC applications using
machine learning techniques [38.063905789566746]
This paper describes a methodology, namely BLEST-ML (BLock size ESTimation through Machine Learning), for block size estimation.
The proposed methodology was evaluated by designing an implementation tailored to dislib, a distributed computing library.
The results we obtained show the ability of BLEST-ML to efficiently determine a suitable way to split a given dataset.
arXiv Detail & Related papers (2022-11-19T23:04:14Z) - ExClus: Explainable Clustering on Low-dimensional Data Representations [9.496898312608307]
Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret.
We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable.
We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
arXiv Detail & Related papers (2021-11-04T21:24:01Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Switch Spaces: Learning Product Spaces with Sparse Gating [48.591045282317424]
We propose Switch Spaces, a data-driven approach for learning representations in product space.
We introduce sparse gating mechanisms that learn to choose, combine and switch spaces.
Experiments on knowledge graph completion and item recommendations show that the proposed switch space achieves new state-of-the-art performances.
arXiv Detail & Related papers (2021-02-17T11:06:59Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - Auxiliary-task learning for geographic data with autoregressive
embeddings [1.4823143667165382]
We propose SXL, a method for embedding information on the autoregressive nature of spatial data directly into the learning process.
We utilize the local Moran's I, a popular measure of local spatial autocorrelation, to "nudge" the model to learn the direction and magnitude of local spatial effects.
We highlight how our method consistently improves the training of neural networks in unsupervised and supervised learning tasks.
arXiv Detail & Related papers (2020-06-18T12:16:08Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.