Automatic Semantic Augmentation of Language Model Prompts (for Code
Summarization)
- URL: http://arxiv.org/abs/2304.06815v3
- Date: Thu, 11 Jan 2024 22:47:35 GMT
- Title: Automatic Semantic Augmentation of Language Model Prompts (for Code
Summarization)
- Authors: Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, Earl T. Barr
- Abstract summary: Developers tend to consciously and unconsciously have a collection of semantics facts in mind when working on coding tasks.
One might assume that the powerful multi-layer architecture of transformer-style LLMs makes them inherently capable of doing this simple level of "code analysis"
We evaluate whether automatically augmenting an LLM's prompt with semantic facts explicitly, actually helps.
- Score: 7.699967852459232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLM) are a new class of computation engines,
"programmed" via prompt engineering. We are still learning how to best
"program" these LLMs to help developers. We start with the intuition that
developers tend to consciously and unconsciously have a collection of semantics
facts in mind when working on coding tasks. Mostly these are shallow, simple
facts arising from a quick read. For a function, examples of facts might
include parameter and local variable names, return expressions, simple pre- and
post-conditions, and basic control and data flow, etc.
One might assume that the powerful multi-layer architecture of
transformer-style LLMs makes them inherently capable of doing this simple level
of "code analysis" and extracting such information, implicitly, while
processing code: but are they, really? If they aren't, could explicitly adding
this information help? Our goal here is to investigate this question, using the
code summarization task and evaluate whether automatically augmenting an LLM's
prompt with semantic facts explicitly, actually helps.
Prior work shows that LLM performance on code summarization benefits from
few-shot samples drawn either from the same-project or from examples found via
information retrieval methods (such as BM25). While summarization performance
has steadily increased since the early days, there is still room for
improvement: LLM performance on code summarization still lags its performance
on natural-language tasks like translation and text summarization.
We find that adding semantic facts actually does help! This approach improves
performance in several different settings suggested by prior work, including
for two different Large Language Models. In most cases, improvement nears or
exceeds 2 BLEU; for the PHP language in the challenging CodeSearchNet dataset,
this augmentation actually yields performance surpassing 30 BLEU.
Related papers
- Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints [20.844061807562436]
We propose SENSE, a novel prompting approach that embeds semantic hints within the prompt.
Experiments show that SENSE consistently improves LLMs' performance across various tasks.
arXiv Detail & Related papers (2024-09-22T14:35:09Z) - Source Code Summarization in the Era of Large Language Models [23.715005053430957]
Large language models (LLMs) have led to a great boost in the performance of code-related tasks.
In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs.
arXiv Detail & Related papers (2024-07-09T05:48:42Z) - Exploring the Capabilities of LLMs for Code Change Related Tasks [14.261870410238643]
Large language models (LLMs) have shown their effectiveness in code-related tasks.
LLMs focus on general code syntax and semantics rather than the differences between two code versions.
We conduct an empirical study using textgreater 1B parameters LLMs on three code-change-related tasks.
arXiv Detail & Related papers (2024-07-03T05:49:18Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - Optimizing LLM Queries in Relational Workloads [58.254894049950366]
We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries.
We implement these optimizations in Apache Spark, with vLLM as the model serving backend.
We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
arXiv Detail & Related papers (2024-03-09T07:01:44Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.