Learning to Rank Graph-based Application Objects on Heterogeneous
Memories
- URL: http://arxiv.org/abs/2211.02195v1
- Date: Fri, 4 Nov 2022 00:20:31 GMT
- Title: Learning to Rank Graph-based Application Objects on Heterogeneous
Memories
- Authors: Diego Moura, Vinicius Petrucci and Daniel Mosse
- Abstract summary: This paper describes a methodology for identifying and characterizing application objects that have the most influence on the application's performance.
By performing data placement using our predictive model, we can reduce the execution time degradation by 12% (average) and 30% (max) when compared to the baseline's approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Persistent Memory (PMEM), also known as Non-Volatile Memory (NVM), can
deliver higher density and lower cost per bit when compared with DRAM. Its main
drawback is that it is typically slower than DRAM. On the other hand, DRAM has
scalability problems due to its cost and energy consumption. Soon, PMEM will
likely coexist with DRAM in computer systems but the biggest challenge is to
know which data to allocate on each type of memory. This paper describes a
methodology for identifying and characterizing application objects that have
the most influence on the application's performance using Intel Optane DC
Persistent Memory. In the first part of our work, we built a tool that
automates the profiling and analysis of application objects. In the second
part, we build a machine learning model to predict the most critical object
within large-scale graph-based applications. Our results show that using
isolated features does not bring the same benefit compared to using a carefully
chosen set of features. By performing data placement using our predictive
model, we can reduce the execution time degradation by 12\% (average) and 30\%
(max) when compared to the baseline's approach based on LLC misses indicator.
Related papers
- vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs [7.816840847892339]
Speculative decoding (SD) has attracted a significant amount of research attention due to the substantial speedup it can achieve for LLM inference.
We propose Skippy Simultaneous Speculative Decoding (or S3D), a cost-effective self-speculative SD method based on simultaneous multi-token decoding and mid-layer skipping.
Our method has achieved one of the top performance-memory ratios while requiring minimal architecture changes and training data.
arXiv Detail & Related papers (2024-05-30T17:54:35Z) - LLM in a flash: Efficient Large Language Model Inference with Limited Memory [19.668719251238176]
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks.
This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity.
Our method involves constructing an inference cost model that takes into account the characteristics of flash memory.
arXiv Detail & Related papers (2023-12-12T18:57:08Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning.
It modifies the execution of neural networks to avoid materialising full buffers in memory.
This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - Diagonal Memory Optimisation for Machine Learning on Micro-controllers [21.222568055417717]
Micro controllers and low power CPUs are increasingly being used to perform inference with machine learning models.
Small amounts of RAM available on these targets sets limits on size of models which can be executed.
diagonal memory optimisation technique is described and shown to achieve memory savings of up to 34.5% when applied to eleven common models.
arXiv Detail & Related papers (2020-10-04T19:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.