EPSO: A Caching-Based Efficient Superoptimizer for BPF Bytecode
- URL: http://arxiv.org/abs/2511.15589v1
- Date: Wed, 19 Nov 2025 16:21:20 GMT
- Title: EPSO: A Caching-Based Efficient Superoptimizer for BPF Bytecode
- Authors: Qian Zhu, Yuxuan Liu, Ziyuan Zhu, Shangqing Liu, Lei Bu,
- Abstract summary: We propose EPSO, a caching-based superoptimizer that discovers rewrite rules via offline superoptimization and reuses them to achieve high-quality optimizations with minimal runtime overhead.<n>We evaluate EPSO on benchmarks from the Linux kernel and several eBPF-based projects, including Cilium, Katran, hXDP, Sysdig, Tetragon, and Tracee.
- Score: 12.043472068899733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extended Berkeley Packet Filter (eBPF) allows developers to extend Linux kernel functionality without modifying its source code. To ensure system safety, an in-kernel safety checker, the verifier, enforces strict safety constraints (for example, a limited program size) on eBPF programs loaded into the kernel. These constraints, combined with eBPF's performance-critical use cases, make effective optimization essential. However, existing compilers (such as Clang) offer limited optimization support, and many semantics-preserving transformations are rejected by the verifier, which makes handcrafted optimization rule design both challenging and limited in effectiveness. Superoptimization overcomes the limitations of rule-based methods by automatically discovering optimal transformations, but its high computational cost limits scalability. To address this, we propose EPSO, a caching-based superoptimizer that discovers rewrite rules via offline superoptimization and reuses them to achieve high-quality optimizations with minimal runtime overhead. We evaluate EPSO on benchmarks from the Linux kernel and several eBPF-based projects, including Cilium, Katran, hXDP, Sysdig, Tetragon, and Tracee. EPSO discovers 795 rewrite rules and achieves up to 68.87 percent (average 24.37 percent) reduction in program size compared to Clang's output, outperforming the state-of-the-art BPF optimizer K2 on all benchmarks and Merlin on 92.68 percent of them. Additionally, EPSO reduces program runtime by an average of 6.60 percent, improving throughput and lowering latency in network applications.
Related papers
- A Two-Stage GPU Kernel Tuner Combining Semantic Refactoring and Search-Based Optimization [9.49293344824955]
This paper introduces a template-based rewriting layer on top of an agent-driven iterative loop.<n>The proposed method can be extended to deliver automated performance optimization for real production workloads.
arXiv Detail & Related papers (2026-01-19T03:40:12Z) - Bridging the Gap: Empowering Small Models in Reliable OpenACC-based Parallelization via GEPA-Optimized Prompting [0.0]
We present a systematic prompt optimization approach to enhance OpenACC pragma generation.<n>We observe an increase in compilation success rates for programs annotated with OpenACC pragma.
arXiv Detail & Related papers (2026-01-12T23:54:08Z) - EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models [27.430839306140157]
Large Language Models (LLMs) for automating kernel optimization promise promise.<n>General-purpose LLM code evolution methods cannot meet strict correctness requirements of kernel optimization.<n>EvoEngineer provides guidance for designing and adapting optimization strategies to achieve a balance between performance and correctness.<n>Our method achieves a maximum speedup of textbf36.75$times among all operations over PyTorch kernels and delivers the highest speedup on textbf28 (textbf56.0%) of 50 operations that achieve over textbf2times$ acceleration.
arXiv Detail & Related papers (2025-10-04T10:00:25Z) - Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization [25.135006275638172]
We introduce robust-kbench, a new benchmark for rigorous evaluation of kernel performance and correctness across varied scenarios.<n>We also present a comprehensive agentic framework that translates torch code to kernels and iteratively improve their runtime setting.<n>Our approach produces kernels outperforming torch implementations for practical applications, including forward and backward passes.
arXiv Detail & Related papers (2025-09-16T11:08:30Z) - FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.<n>We propose a principled metric to replace each pruned block using a weight-sharing mechanism.<n> Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions [0.07696728525672149]
Linux's extended Berkeley Packet Filter (BPF) avoids user-/ kernel transitions by just-in-time compiling user-provided bytecode.<n>To mitigate the Spectre vulnerabilities disclosed in 2018, defenses which reject potentially-dangerous programs had to be deployed.<n>We propose VeriFence, an enhancement to the kernel's Spectre defenses that reduces the number of BPF application programs rejected from 54% to zero.
arXiv Detail & Related papers (2024-04-30T12:34:23Z) - Tuning Particle Accelerators with Safety Constraints using Bayesian
Optimization [73.94660141019764]
tuning machine parameters of particle accelerators is a repetitive and time-consuming task.
We propose and evaluate a step size-limited variant of safe Bayesian optimization.
arXiv Detail & Related papers (2022-03-26T02:21:03Z) - Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs)<n>We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO)<n>PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.