Related papers: CHERI Performance Enhancement for a Bytecode Interpreter

CHERI Performance Enhancement for a Bytecode Interpreter

URL: http://arxiv.org/abs/2308.05076v2
Date: Tue, 12 Sep 2023 20:19:43 GMT
Title: CHERI Performance Enhancement for a Bytecode Interpreter
Authors: Duncan Lowther, Dejice Jacob, Jeremy Singer
Abstract summary: We show that it is possible to eliminate certain kinds of software-induced runtime overhead that occur due to the larger size of CHERI capabilities (128 bits) relative to native pointers (generally 64 bits) The worst-case slowdowns are greatly improved, from 100x (before optimization) to 2x (after optimization)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: During our port of the MicroPython bytecode interpreter to the CHERI-based Arm Morello platform, we encountered a number of serious performance degradations. This paper explores several of these performance issues in detail, in each case we characterize the cause of the problem, the fix, and the corresponding interpreter performance improvement over a set of standard Python benchmarks. While we recognize that Morello is a prototypical physical instantiation of the CHERI concept, we show that it is possible to eliminate certain kinds of software-induced runtime overhead that occur due to the larger size of CHERI capabilities (128 bits) relative to native pointers (generally 64 bits). In our case, we reduce a geometric mean benchmark slowdown from 5x (before optimization) to 1.7x (after optimization) relative to AArch64, non-capability, execution. The worst-case slowdowns are greatly improved, from 100x (before optimization) to 2x (after optimization). The key insight is that implicit pointer size presuppositions pervade systems code; whereas previous CHERI porting projects highlighted compile-time and execution-time errors exposed by pointer size assumptions, we instead focus on the performance implications of such assumptions.

Related papers

Fun with flags: How Compilers Break and Fix Constant-Time Code [0.0]
We analyze how compiler optimizations break constant-time code.<n>Key insight is that a small set of passes are at the root of most leaks.<n>We propose an original and practical mitigation that requires no source code modification or custom compiler.
arXiv Detail & Related papers (2025-07-08T15:52:17Z)
An Empirical Study on the Performance and Energy Usage of Compiled Python Code [5.829253903555323]
Python is a popular programming language known for its ease of learning and extensive libraries.<n>There is limited analysis comparing their performance and energy efficiency.<n>Our study investigates how compilation impacts the performance and energy consumption of Python code.
arXiv Detail & Related papers (2025-05-05T04:01:56Z)
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions [4.852619858744873]
Large Language Models (LLMs) have shown astounding capabilities of code understanding and synthesis. We introduce ThrowBench, a benchmark consisting of over 2,400 short user-written programs written in four different programming languages. We evaluate our benchmark on six state-of-the-art code LLMs and see modest performance ranging from 19 to 38% (F1 score)
arXiv Detail & Related papers (2025-03-06T09:22:23Z)
An Effectively $Ω(c)$ Language and Runtime [0.0]
Good performance of an application is conceptually more of a binary function. Our vision is to create a language and runtime that is designed to be $Omega(c)$ in its performance.
arXiv Detail & Related papers (2024-09-30T16:57:45Z)
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages. It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total. Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z)
Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies. This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers.
arXiv Detail & Related papers (2024-06-17T23:26:41Z)
Optimization of Armv9 architecture general large language model inference performance based on Llama.cpp [0.3749861135832073]
This article optimize the inference performance of the Qwen-1.8B model by performing Int8 quantization, vectorizing some operators in llama, and modifying the compilation script. On the Yitian 710 experimental platform, the prefill performance is increased by 1.6 times, the decoding performance is increased by 24 times, the memory usage is reduced to 1/5 of the original, and the accuracy loss is almost negligible.
arXiv Detail & Related papers (2024-06-16T06:46:25Z)
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models [77.0501668780182]
Retrieval augmentation addresses many critical problems in large language models. Running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages.
arXiv Detail & Related papers (2023-10-02T16:48:47Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning [0.0]
We present a reinforcement learning based solution to the phase ordering problem. We propose two approaches to model the sequences: one by manual ordering, and other based on a graph called Oz Dependence Graph (ODG)
arXiv Detail & Related papers (2022-07-27T08:32:23Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization [8.404254529115835]
A common pain point in differentially private machine learning is the significant runtime overhead incurred when executing Differentially Private Gradient Descent (DPSGD) We demonstrate that by exploiting powerful language primitives, one can dramatically reduce these overheads, in many cases nearly matching the best non-private running times.
arXiv Detail & Related papers (2020-10-18T18:45:04Z)
Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices. Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.