Related papers: Towards Faster Reasoners By Using Transparent Huge Pages

Towards Faster Reasoners By Using Transparent Huge Pages

URL: http://arxiv.org/abs/2004.14378v1
Date: Wed, 29 Apr 2020 17:57:19 GMT
Title: Towards Faster Reasoners By Using Transparent Huge Pages
Authors: Johannes K. Fichte, Norbert Manthey, Julian Stecklina, Andr\'e Schidler
Abstract summary: In this work, we present an approach to reduce the runtime of AR tools by 10% on average and up to 20% for long running tasks. Our improvement addresses the high memory usage that comes with the data structures used in AR tools, which are based on conflict driven no-good learning.
Score: 0.491574468325115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Various state-of-the-art automated reasoning (AR) tools are widely used as backend tools in research of knowledge representation and reasoning as well as in industrial applications. In testing and verification, those tools often run continuously or nightly. In this work, we present an approach to reduce the runtime of AR tools by 10% on average and up to 20% for long running tasks. Our improvement addresses the high memory usage that comes with the data structures used in AR tools, which are based on conflict driven no-good learning. We establish a general way to enable faster memory access by using the memory cache line of modern hardware more effectively. Therefore, we extend the standard C library (glibc) by dynamically allowing to use a memory management feature called huge pages. Huge pages allow to reduce the overhead that is required to translate memory addresses between the virtual memory of the operating system and the physical memory of the hardware. In that way, we can reduce runtime, costs, and energy consumption of AR tools and applications with similar memory access patterns simply by linking the tool against this new glibc library when compiling it. In every day industrial applications this easily allows to be more eco-friendly in computation. To back up the claimed speed-up, we present experimental results for tools that are commonly used in the AR community, including the domains ASP, BMC, MaxSAT, SAT, and SMT.

Related papers

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
Needle in the Haystack for Memory Based Large Language Models [31.885539843977472]
Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. We investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training.
arXiv Detail & Related papers (2024-07-01T16:32:16Z)
Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit. We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z)
Learning to Rank Graph-based Application Objects on Heterogeneous Memories [0.0]
This paper describes a methodology for identifying and characterizing application objects that have the most influence on the application's performance. By performing data placement using our predictive model, we can reduce the execution time degradation by 12% (average) and 30% (max) when compared to the baseline's approach.
arXiv Detail & Related papers (2022-11-04T00:20:31Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Carousel Memory: Rethinking the Design of Episodic Memory for Continual Learning [19.260402028696916]
Continual Learning (CL) aims to learn from a continuous stream of tasks without forgetting knowledge learned from the previous tasks. Previous studies exploit episodic memory (EM), which stores a subset of the past observed samples while learning from new non-i.i.d. data. We propose to exploit the abundant storage to preserve past experiences and alleviate the forgetting by allowing CL to efficiently migrate samples between memory and storage.
arXiv Detail & Related papers (2021-10-14T11:27:45Z)
Programmable FPGA-based Memory Controller [9.013666207570749]
This paper introduces a modular and programmable memory controller that can be configured for different target applications on available hardware resources. The proposed memory controller efficiently supports cache-line accesses along with bulk memory transfers. We show improved overall memory access time up to 58% on CNN and GCN workloads compared with commercial memory controller IPs.
arXiv Detail & Related papers (2021-08-21T23:53:12Z)
Hands-off Model Integration in Spatial Index Structures [8.710716183434918]
In this paper we explore the opportunity to use light-weight machine learning models to accelerate queries on spatial indexes. We do so by exploring the potential of using and similar techniques on the R-tree, arguably the most broadly used spatial index. As we show in our analysis, the query execution time can be reduced by up to 60% while simultaneously shrinking the index's memory footprint by over 90%.
arXiv Detail & Related papers (2020-06-29T22:05:28Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.