Hammer: Towards Efficient Hot-Cold Data Identification via Online Learning
- URL: http://arxiv.org/abs/2411.14759v1
- Date: Fri, 22 Nov 2024 06:46:16 GMT
- Title: Hammer: Towards Efficient Hot-Cold Data Identification via Online Learning
- Authors: Kai Lu, Siqi Zhao, Jiguang Wan,
- Abstract summary: Management of storage resources in big data and cloud computing environments requires accurate identification of data's "cold" and "hot" states.
Traditional methods, such as rule-based algorithms and early AI techniques, often struggle with dynamic workloads.
We propose a novel solution based on online learning strategies, achieving higher accuracy and lower operational costs.
- Score: 12.60117987781744
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Efficient management of storage resources in big data and cloud computing environments requires accurate identification of data's "cold" and "hot" states. Traditional methods, such as rule-based algorithms and early AI techniques, often struggle with dynamic workloads, leading to low accuracy, poor adaptability, and high operational overhead. To address these issues, we propose a novel solution based on online learning strategies. Our approach dynamically adapts to changing data access patterns, achieving higher accuracy and lower operational costs. Rigorous testing with both synthetic and real-world datasets demonstrates a significant improvement, achieving a 90% accuracy rate in hot-cold classification. Additionally, the computational and storage overheads are considerably reduced.
Related papers
- Selective Embedding for Deep Learning [0.4499833362998489]
Deep learning algorithms are sensitive to input data, and performance often deteriorates under nonstationary conditions.<n>This study introduces selective embedding, a novel data loading strategy, which alternates short segments of data from multiple sources within a single input channel.
arXiv Detail & Related papers (2025-07-16T15:45:01Z) - ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search [53.40810298627443]
ReGUIDE is a framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism.<n>Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks.
arXiv Detail & Related papers (2025-05-21T08:36:18Z) - What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration.
We identify the critical limitations of regression-based methods with the widely used data generation pipeline.
We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z) - A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation [0.562479170374811]
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift.
Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data.
We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness.
arXiv Detail & Related papers (2024-11-23T17:35:23Z) - Towards using Reinforcement Learning for Scaling and Data Replication in Cloud Systems [0.49157446832511503]
Reinforcement learning is used in many areas related to the Cloud Computing, and it is a promising field to get automatic data replication strategies.
In this work, we survey data replication strategies and data scaling based on reinforcement learning (RL)
arXiv Detail & Related papers (2024-10-07T11:32:35Z) - Layer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer
Level Attack and Knowledge Distillation [21.587358050012032]
We propose a fast and novel machine unlearning paradigm at the layer level called layer attack unlearning.
In this work, we introduce the Partial-PGD algorithm to locate the samples to forget efficiently.
We also use Knowledge Distillation (KD) to reliably learn the decision boundaries from the teacher.
arXiv Detail & Related papers (2023-12-28T04:38:06Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Segmentation-guided Domain Adaptation for Efficient Depth Completion [3.441021278275805]
We propose an efficient depth completion model based on a vgg05-like CNN architecture and a semi-supervised domain adaptation approach.
In order to boost spatial coherence, we guide the learning process using segmentations as additional source of information.
Our approach improves on previous efficient and low parameter state of the art approaches while having a noticeably lower computational footprint.
arXiv Detail & Related papers (2022-10-14T13:01:25Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Semantic Perturbations with Normalizing Flows for Improved
Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations.
We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.