Related papers: Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning

Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning

URL: http://arxiv.org/abs/2311.10388v2
Date: Tue, 16 Jan 2024 07:58:25 GMT
Title: Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning
Authors: Junjie Zhao and Xiang Chen and Guang Yang and Yiheng Shen
Abstract summary: In this study, we propose an approach SCCLLM based on large language models (LLMs) and in-context learning. Specifically, in the demonstration selection phase, SCCLLM retrieves the top-k code snippets from the historical corpus. In the in-context learning phase, SCCLLM utilizes the retrieved code snippets as demonstrations.
Score: 11.52122354673779
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The previous smart contract code comment (SCC) generation approaches can be divided into two categories: fine-tuning paradigm-based approaches and information retrieval-based approaches. However, for the fine-tuning paradigm-based approaches, the performance may be limited by the quality of the gathered dataset for the downstream task and they may have knowledge-forgetting issues. While for the information retrieval-based approaches, it is difficult for them to generate high-quality comments if similar code does not exist in the historical repository. Therefore we want to utilize the domain knowledge related to SCC generation in large language models (LLMs) to alleviate the disadvantages of these two types of approaches. In this study, we propose an approach SCCLLM based on LLMs and in-context learning. Specifically, in the demonstration selection phase, SCCLLM retrieves the top-k code snippets from the historical corpus by considering syntax, semantics, and lexical information. In the in-context learning phase, SCCLLM utilizes the retrieved code snippets as demonstrations, which can help to utilize the related knowledge for this task. We select a large corpus from a smart contract community Etherscan.io as our experimental subject. Extensive experimental results show the effectiveness of SCCLLM when compared with baselines in automatic evaluation and human evaluation.

Related papers

CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation [31.469511576774252]
We propose a context-aware adaptive decoding method for large language models.<n>Our approach achieves a 2.8 percent average improvement on TruthfulQA.<n>Our model-agnostic, scalable, and efficient method requires only a single generation pass.
arXiv Detail & Related papers (2025-08-04T08:28:25Z)
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z)
Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation [10.77747590700758]
Large language models (LLMs) have achieved significant advancements in software mining. handling the syntactic structure of source code remains a challenge. This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z)
What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond [32.467437657603604]
Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. We propose AllianceCoder, a novel context-integrated method that employs chain-of-thought prompting to decompose user queries into implementation steps and retrieves APIs via semantic description matching. Through extensive experiments on CoderEval and RepoExec, AllianceCoder achieves state-of-the-art performance, improving Pass@1 by up to 20% over existing approaches.
arXiv Detail & Related papers (2025-03-26T14:41:38Z)
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference [78.08901120841833]
We propose a method to detect the knowledge boundary of Visual Large Language Models (VLLMs) We show that our method successfully depicts a VLLM's knowledge boundary based on which we are able to reduce indiscriminate retrieval while maintaining or improving the performance.
arXiv Detail & Related papers (2025-02-25T09:32:08Z)
Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL) We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping. ICCD emphasizes input-label mapping by contrasting the output distributions between positive and negative in-context examples.
arXiv Detail & Related papers (2025-02-19T14:04:46Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation [19.69195067838796]
We construct a class-level code translation benchmark, ClassEval-T, and make the first attempt to extensively assess recent LLMs' performance on class-level code translation. It cost us 360 person-hours to accomplish the manual migration to Java and C++ with complete code samples and associated test suites. Experimental results demonstrate a remarkable performance drop compared with the most widely studied method-level code translation benchmark.
arXiv Detail & Related papers (2024-11-09T11:13:14Z)
ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [4.029675201787349]
Cross-lingual summarization is a sophisticated branch in Natural Language Processing. There is no feasible solution for CLS when there is no available high-quality CLS data. We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z)
TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation [16.93374578679005]
TokenRec is a novel framework for tokenizing and retrieving large-scale language models (LLMs) based Recommender Systems (RecSys) Our strategy, Masked Vector-Quantized (MQ) Tokenizer, quantizes the masked user/item representations learned from collaborative filtering into discrete tokens. Our generative retrieval paradigm is designed to efficiently recommend top-$K$ items for users to eliminate the need for auto-regressive decoding and beam search processes.
arXiv Detail & Related papers (2024-06-15T00:07:44Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts. However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations. We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus. Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z)
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks. Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes. We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z)
Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability. We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples. We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.