MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data
Storage
- URL: http://arxiv.org/abs/2208.02987v1
- Date: Fri, 5 Aug 2022 05:11:12 GMT
- Title: MIX-RS: A Multi-indexing System based on HDFS for Remote Sensing Data
Storage
- Authors: Jiashu Wu, Jingpan Xiong, Hao Dai, Yang Wang, Chengzhong Xu
- Abstract summary: A large volume of remote sensing (RS) data has been generated with the deployment of satellite technologies.
The characteristics of RS data (e.g., enormous volume, large single-file size and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage.
To use RS data, one of the most important techniques is geospatial indexing.
We propose a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS.
- Score: 21.033380514644616
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A large volume of remote sensing (RS) data has been generated with the
deployment of satellite technologies. The data facilitates research in
ecological monitoring, land management and desertification, etc. The
characteristics of RS data (e.g., enormous volume, large single-file size and
demanding requirement of fault tolerance) make the Hadoop Distributed File
System (HDFS) an ideal choice for RS data storage as it is efficient, scalable
and equipped with a data replication mechanism for failure resilience. To use
RS data, one of the most important techniques is geospatial indexing. However,
the large data volume makes it time-consuming to efficiently construct and
leverage. Considering that most modern geospatial data centres are equipped
with HDFS-based big data processing infrastructures, deploying multiple
geospatial indices becomes natural to optimise the efficacy. Moreover, because
of the reliability introduced by high-quality hardware and the infrequently
modified property of the RS data, the use of multi-indexing will not cause
large overhead. Therefore, we design a framework called Multi-IndeXing-RS
(MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data
replication enabled for both fault tolerance and geospatial indexing
efficiency. Given the fault tolerance provided by the HDFS, RS data is
structurally stored inside for faster geospatial indexing. Additionally,
multi-indexing enhances efficiency. The proposed technique naturally sits on
top of the HDFS to form a holistic framework without incurring severe overhead
or sophisticated system implementation efforts. The MIX-RS framework is
implemented and evaluated using real remote sensing data provided by the
Chinese Academy of Sciences, demonstrating excellent geospatial indexing
performance.
Related papers
- TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval [10.268774281394261]
Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage.
Modern RAG pipelines rely on large datastores, leading to system challenges in latency-sensitive deployments.
We propose TeleRAG, an efficient inference system that reduces RAG latency with minimal GPU memory requirements.
arXiv Detail & Related papers (2025-02-28T11:32:22Z) - Evaluating Fault Tolerance and Scalability in Distributed File Systems: A Case Study of GFS, HDFS, and MinIO [0.9307293959047378]
Distributed File Systems (DFS) are essential for managing vast datasets across multiple servers, offering benefits in scalability, fault tolerance, and data accessibility.
This paper presents a comprehensive evaluation of three prominent DFSs - Google File System (GFS), Hadoop Distributed File System (HDFS), and MinIO.
Through detailed analysis, how these systems handle data redundancy, server failures, and client access protocols, ensuring reliability in dynamic, large-scale environments is assessed.
arXiv Detail & Related papers (2025-02-04T03:52:45Z) - Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures [0.6990493129893112]
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data.
Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization.
We propose three innovatives designed to quantify and strategically reduce data space partition overlap.
arXiv Detail & Related papers (2024-08-28T16:16:55Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)
We introduce an innovative approach combining pruning methods with advanced model designs.
Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image
Retrieval [20.53316810731414]
We propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP$2$ Loss)
The proposed HyP$2$ Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs.
arXiv Detail & Related papers (2022-08-14T15:06:27Z) - Automating DBSCAN via Deep Reinforcement Learning [73.82740568765279]
We propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN.
The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process.
The framework consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively.
arXiv Detail & Related papers (2022-08-09T04:40:11Z) - UDRN: Unified Dimensional Reduction Neural Network for Feature Selection
and Feature Projection [37.03465340777392]
Dimensional reduction(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives.
FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure)
FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity.
We develop a unified framework, Unified Dimensional Reduction Neural-network(UDRN), that integrates FS and FP in a compatible, end-to-end way.
arXiv Detail & Related papers (2022-07-08T10:30:20Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing.
Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets.
It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - DHA: End-to-End Joint Optimization of Data Augmentation Policy,
Hyper-parameter and Architecture [81.82173855071312]
We propose an end-to-end solution that integrates the AutoML components and returns a ready-to-use model at the end of the search.
Dha achieves state-of-the-art (SOTA) results on various datasets, especially 77.4% accuracy on ImageNet with cell based search space.
arXiv Detail & Related papers (2021-09-13T08:12:50Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.