On Scalable Integrity Checking for Secure Cloud Disks
- URL: http://arxiv.org/abs/2405.03830v3
- Date: Wed, 29 Jan 2025 15:16:35 GMT
- Title: On Scalable Integrity Checking for Secure Cloud Disks
- Authors: Quinn Burke, Ryan Sheatsley, Rachel King, Owen Hines, Michael Swift, Patrick McDaniel,
- Abstract summary: Merkle hash trees are the standard method to protect the integrity and freshness of stored data.<n>In this paper, we quantify performance overheads of storage-level hash trees in realistic settings.<n>We then design an optimized tree structure called Dynamic Merkle Trees (DMTs) based on an analysis of root causes of overheads.
- Score: 2.2768179587677104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Merkle hash trees are the standard method to protect the integrity and freshness of stored data. However, hash trees introduce additional compute and I/O costs on the I/O critical path, and prior efforts have not fully characterized these costs. In this paper, we quantify performance overheads of storage-level hash trees in realistic settings. We then design an optimized tree structure called Dynamic Merkle Trees (DMTs) based on an analysis of root causes of overheads. DMTs exploit patterns in workloads to deliver up to a 2.2x throughput and latency improvement over the state of the art. Our novel approach provides a promising new direction to achieve integrity guarantees in storage efficiently and at scale.
Related papers
- DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees [4.077820670802213]
DobLIX is a dual-objective learned index specifically designed for Log-Structured Merge(LSM) tree-based key-value stores.
We show that DobLIX reduces indexing overhead and improves throughput by 1.19 to 2.21 times compared to state-of-the-art methods within RocksDB.
arXiv Detail & Related papers (2025-02-07T22:48:14Z) - TreeKV: Smooth Key-Value Cache Compression with Tree Structures [19.06842704338332]
TreeKV is a training-free method that employs a tree structure for smooth cache compression.
It consistently surpasses all baseline models in language modeling tasks on PG19 and OpenWebText2.
arXiv Detail & Related papers (2025-01-09T06:00:27Z) - FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training [51.39495282347475]
We introduce $textttFRUGAL$ ($textbfF$ull-$textbfR$ank $textbfU$pdates with $textbfG$r$textbfA$dient sp$textbfL$itting, a new memory-efficient optimization framework.
Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam.
arXiv Detail & Related papers (2024-11-12T14:41:07Z) - CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information [33.01180010689081]
We introduce an efficient structured pruning framework named CFSP.
We first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block.
Results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets.
arXiv Detail & Related papers (2024-09-20T04:03:27Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - FOBNN: Fast Oblivious Binarized Neural Network Inference [12.587981899648419]
We develop a fast oblivious binarized neural network inference framework, FOBNN.
Specifically, we customize binarized convolutional neural networks to enhance oblivious inference, design two fast algorithms for binarized convolutions, and optimize network structures experimentally under constrained costs.
arXiv Detail & Related papers (2024-05-06T03:12:36Z) - NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for Spatially-Aware Image Hashing and Retrieval [5.0923114224599555]
We introduce NeuroHash, a novel neuro-symbolic framework leveraging Hyperdimensional Computing (HDC) to enable highly customizable, spatially-aware image retrieval.
NeuroHash combines pre-trained deep neural network models with HDC-based symbolic models, allowing for flexible manipulation of hash values to support conditional image retrieval.
We evaluate NeuroHash on two benchmark datasets, demonstrating superior performance compared to state-of-the-art hashing methods.
arXiv Detail & Related papers (2024-04-17T03:01:47Z) - An Efficient and Scalable Auditing Scheme for Cloud Data Storage using an Enhanced B-tree [0.6773121102591492]
We present a novel dynamic auditing scheme for centralized cloud environments leveraging an enhanced version of the B-tree.
Unlike other static auditing schemes, our scheme supports dynamic insert, update, and delete operations.
Also, by leveraging an enhanced B-tree, our scheme maintains a balanced tree after any alteration to a certain file, improving performance significantly.
arXiv Detail & Related papers (2024-01-17T04:01:18Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance.
We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization.
By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z) - DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identification [5.407157027628579]
We propose a deep hash-based vehicle re-identification framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efficiency.
DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module.
textbfDVHN of $2048$ bits can achieve 13.94% and 10.21% accuracy improvement in terms of textbfmAP and textbfRank@1 for textbfVehicleID (800) dataset.
arXiv Detail & Related papers (2021-12-09T14:11:27Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree
Ensemble Deployment [4.314299343332365]
We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory.
Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from external memory algorithms.
The result is that each I/O yields a higher fraction of useful data, leading to a 2-6 times reduction in classification latency for interactive workloads.
arXiv Detail & Related papers (2020-11-10T20:32:11Z) - ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image
Retrieval [43.41089241581596]
We study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images.
We propose a unified end-to-end trainable network, termed as ExchNet.
Our proposal consistently outperforms state-of-the-art generic hashing methods on five fine-grained datasets.
arXiv Detail & Related papers (2020-08-04T07:01:32Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution.
We propose to employ the distribution of Boltzmann machine as the retrievalal posterior.
We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z) - Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code.
In this study, we propose a novel reinforcing short-length hashing (RSLH)
In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information.
Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z) - ENTMOOT: A Framework for Optimization over Ensemble Tree Models [57.98561336670884]
ENTMOOT is a framework for integrating tree models into larger optimization problems.
We show how ENTMOOT allows a simple integration of tree models into decision-making and black-box optimization.
arXiv Detail & Related papers (2020-03-10T14:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.