Related papers: AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs

Related papers

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
CLEVER: A Curated Benchmark for Formally Verified Code Generation [57.476483009565044]
$rm Csmall LEVER$ is a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean.<n>Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification.
arXiv Detail & Related papers (2025-05-20T05:15:47Z)
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs [1.8911962184174564]
We propose a cheaper alternative bilinear operator to matrix-multiplication in deep neural networks (DNNs) We show that replacing emphall linear layers with STL and training from scratch, results in factor x2.7 reduction in FLOPs with a 0.5 emphaccuracy improvement. Finetuning TinyLlama citetinyllama24 with STL layers on the Slim Pajama dataset, achieves similar accuracy to 2:4, with x2.2 FLOP speedup compared to x1.7 of the latter.
arXiv Detail & Related papers (2025-03-15T17:31:36Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding [62.68430939686566]
We present ParallelSpec, an alternative to auto-regressive drafting strategies in state-of-the-art speculative decoding approaches. In contrast to auto-regressive drafting in the speculative stage, we train a parallel drafter to serve as an efficient speculative model.
arXiv Detail & Related papers (2024-10-08T01:05:08Z)
OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation [4.266086505323998]
This paper introduces OMPar, an AI-driven tool designed to automate the parallelization of C/C++ code using OpenMP pragmas. OMPar integrates Large Language Models (LLMs) through two key components: OMPify, which assesses loop parallelization potential, and MonoCoder-OMP, a new fine-tuned model which generates precise OpenMP pragmas.
arXiv Detail & Related papers (2024-09-23T07:39:01Z)
Generating Unseen Code Tests In Infinitum [1.0674604700001968]
We present a method for creating benchmark variations that generalize across coding tasks and programming languages. We implement one benchmark, called textitauto-regression, for the task of text-to-code generation in Python.
arXiv Detail & Related papers (2024-07-29T08:11:20Z)
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices [18.81344021150902]
SpecExec is a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families. We demonstrate inference of 50B+ parameter LLMs on consumer GPUs with RAM offloading at 4-6 tokens per second with 4-bit quantization or 2-3 tokens per second with 16-bit weights.
arXiv Detail & Related papers (2024-06-04T17:53:36Z)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [19.167604927651073]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours. Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z)
MPIrigen: MPI Code Generation through Domain-Specific Language Models [3.5352856644774806]
This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs. We introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI. The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation.
arXiv Detail & Related papers (2024-02-14T12:24:21Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding. We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z)
Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently. Retrieval-augmentation versus long context window, which one is better for downstream tasks? Can both methods be combined to get the best of both worlds? Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z)
Advising OpenMP Parallelization via a Graph-Based Approach with Transformers [2.393682571484038]
We propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code. OMPify is based on a Transformer-based model that leverages a graph-based representation of source code. Our results demonstrate that OMPify outperforms existing approaches, the general-purposed and popular ChatGPT and targeted PragFormer models.
arXiv Detail & Related papers (2023-05-16T16:56:10Z)
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation [7.750212995537728]
We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code. We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models. Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
arXiv Detail & Related papers (2023-05-09T21:57:15Z)
Inference with Reference: Lossless Acceleration of Large Language Models [97.04200102556551]
LLMA is an accelerator to speed up Large Language Model (LLM) inference with references. It is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios.
arXiv Detail & Related papers (2023-04-10T09:55:14Z)
A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter Dataset [53.73316523766183]
We introduce a robust semantic frame parsing pipeline that can handle both emphOOD patterns and emphOOV tokens. We also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.
arXiv Detail & Related papers (2022-12-18T01:59:49Z)
Simplifying and Understanding State Space Models with Diagonal Linear RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs. We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs. We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z)
QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming. QParallel removes ambiguities concerning parallelism in current quantum programming languages. We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.