Related papers: BenchPress: A Deep Active Benchmark Generator

BenchPress: A Deep Active Benchmark Generator

URL: http://arxiv.org/abs/2208.06555v2
Date: Tue, 16 Aug 2022 00:40:44 GMT
Title: BenchPress: A Deep Active Benchmark Generator
Authors: Foivos Tsimpourlas, Pavlos Petoumenos, Min Xu, Chris Cummins, Kim Hazelwood, Ajitha Rajan and Hugh Leather
Abstract summary: We develop BenchPress, the first ML benchmark generator for compilers that is steerable within feature space representations of source code. BenchPress synthesizes compiling functions by adding new code in any part of an empty or existing sequence. It produces 10x more unique, compiling OpenCL benchmarks than CLgen, which are significantly larger and more feature diverse.
Score: 7.194212461947882
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We develop BenchPress, the first ML benchmark generator for compilers that is steerable within feature space representations of source code. BenchPress synthesizes compiling functions by adding new code in any part of an empty or existing sequence by jointly observing its left and right context, achieving excellent compilation rate. BenchPress steers benchmark generation towards desired target features that has been impossible for state of the art synthesizers (or indeed humans) to reach. It performs better in targeting the features of Rodinia benchmarks in 3 different feature spaces compared with (a) CLgen - a state of the art ML synthesizer, (b) CLSmith fuzzer, (c) SRCIROR mutator or even (d) human-written code from GitHub. BenchPress is the first generator to search the feature space with active learning in order to generate benchmarks that will improve a downstream task. We show how using BenchPress, Grewe's et al. CPU vs GPU heuristic model can obtain a higher speedup when trained on BenchPress's benchmarks compared to other techniques. BenchPress is a powerful code generator: Its generated samples compile at a rate of 86%, compared to CLgen's 2.33%. Starting from an empty fixed input, BenchPress produces 10x more unique, compiling OpenCL benchmarks than CLgen, which are significantly larger and more feature diverse.

Related papers

GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git [0.8397730500554048]
GitGoodBench is a novel benchmark for evaluating AI agent performance on Version Control System (VCS) tasks.<n>Our benchmark covers three core Git scenarios extracted from open-source Python, Java, and Kotlin repositories.<n>We establish baseline performance on the prototyping version of our benchmark using GPT-4o equipped with custom tools, achieving a 21.11% solve rate overall.
arXiv Detail & Related papers (2025-05-28T16:56:11Z)
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression [19.447797559761135]
Post-training compression reduces the computational and memory costs of large language models (LLMs)<n>Existing compression benchmarks only focus on language modeling and natural language understanding tasks.<n>We introduce ACBench, the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities.
arXiv Detail & Related papers (2025-05-26T02:49:07Z)
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models [63.271278137295006]
Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks. LLMs suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments. We propose NoWag: (Normalized Weight and Activation Guided Compression) as a unified framework for zero-shot shape preserving compression algorithms.
arXiv Detail & Related papers (2025-04-20T11:00:29Z)
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents [49.73885480071402]
We introduce SWE-PolyBench, a new benchmark for repository-level, execution-based evaluation of coding agents. SWE-PolyBench contains 2110 instances from 21 repositories and includes tasks in Java (165), JavaScript (1017), TypeScript (729) and Python (199), covering bug fixes, feature additions, and code. Our experiments show that current agents exhibit uneven performances across languages and struggle with complex problems while showing higher performance on simpler tasks.
arXiv Detail & Related papers (2025-04-11T17:08:02Z)
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators [59.625889531331815]
Triton is a high-level Python-like language designed for building efficient GPU kernels. Despite advances in large language models (LLMs) for conventional code generation, these models struggle to generate accurate, performance-optimized Triton code. In this work, we introduce TritonBench, the first comprehensive benchmark for Triton operator generation.
arXiv Detail & Related papers (2025-02-20T17:21:27Z)
CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs [6.936336826531964]
Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. Existing prompt compression techniques focus on natural language, lacking tailored solutions for code. We propose CodePromptZip, a framework that compresses code examples before integrating into RAG.
arXiv Detail & Related papers (2025-02-19T23:15:23Z)
HAC++: Towards 100X Compression of 3D Gaussian Splatting [55.6351304553003]
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. We propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling.
arXiv Detail & Related papers (2025-01-21T16:23:05Z)
gsplat: An Open-Source Library for Gaussian Splatting [28.65527747971257]
gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized kernels.
arXiv Detail & Related papers (2024-09-10T17:57:38Z)
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection [6.4348035950413]
We present the first generation-based model using CLIP for zero-shot HOI detection, coined HOIGen. We develop a CLIP-injected feature generator in accordance with the generation of human, object and union features. To enrich the HOI scores, we construct a generative prototype bank in a pairwise HOI recognition branch, and a multi-knowledge prototype bank in an image-wise HOI recognition branch.
arXiv Detail & Related papers (2024-08-12T08:02:37Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
PruningBench: A Comprehensive Benchmark of Structural Pruning [50.23493036025595]
We present the first comprehensive benchmark, termed textitPruningBench, for structural pruning. PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques. It provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards.
arXiv Detail & Related papers (2024-06-18T06:37:26Z)
ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model [77.71796503321632]
We introduce a context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS.
arXiv Detail & Related papers (2024-05-31T09:23:39Z)
GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting [25.78134656333095]
We propose a novel framework for real-time generation of pose-controllable talking heads. GaussianTalker builds a canonical 3DGS representation of the head and deforms it in sync with the audio. It exploits the spatial-aware features and enforces interactions between neighboring points.
arXiv Detail & Related papers (2024-04-24T17:45:24Z)
Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain. We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
BenchDirect: A Directed Language Model for Compiler Benchmarks [7.194212461947882]
We develop BenchPress, the first ML compiler benchmark generator that can be directed within source code feature representations. We use active learning to introduce new benchmarks with unseen features into the dataset of Grewe's et al. CPU vs GPU, improving its acquired performance by 50%. In 3 feature spaces, we outperform human-written code from GitHub, CLgen, CLSmith and the SRCIROR mutator in targeting the features of Rodinia benchmarks.
arXiv Detail & Related papers (2023-03-02T20:17:24Z)
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs [147.73127662757335]
We present scalable Multi-hOp REasoning (SMORE), the first general framework for both single-hop and multi-hop reasoning in Knowledge Graphs (KGs) Using a single machine SMORE can perform multi-hop reasoning in Freebase KG (86M entities, 338M edges), which is 1,500x larger than previously considered KGs. SMORE increases throughput (i.e., training speed) over prior multi-hop KG frameworks by 2.2x with minimal GPU memory requirements.
arXiv Detail & Related papers (2021-10-28T05:02:33Z)
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone [45.673814384050004]
We introduce Codabench, an open-sourced, community-driven platform for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone, free of charge, and allows benchmark organizers to compare fairly submissions.
arXiv Detail & Related papers (2021-10-12T07:54:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.