Diverse Title Generation for Stack Overflow Posts with Multiple Sampling
Enhanced Transformer
- URL: http://arxiv.org/abs/2208.11523v1
- Date: Wed, 24 Aug 2022 13:10:48 GMT
- Title: Diverse Title Generation for Stack Overflow Posts with Multiple Sampling
Enhanced Transformer
- Authors: Fengji Zhang, Jin Liu, Yao Wan, Xiao Yu, Xiao Liu, Jacky Keung
- Abstract summary: We propose M$_3$NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets.
M$_3$NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding.
We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M$_3$NSCT5.
- Score: 11.03785369838242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stack Overflow is one of the most popular programming communities where
developers can seek help for their encountered problems. Nevertheless, if
inexperienced developers fail to describe their problems clearly, it is hard
for them to attract sufficient attention and get the anticipated answers. We
propose M$_3$NSCT5, a novel approach to automatically generate multiple post
titles from the given code snippets. Developers may use the generated titles to
find closely related posts and complete their problem descriptions. M$_3$NSCT5
employs the CodeT5 backbone, which is a pre-trained Transformer model having an
excellent language understanding and generation ability. To alleviate the
ambiguity issue that the same code snippets could be aligned with different
titles under varying contexts, we propose the maximal marginal multiple nucleus
sampling strategy to generate multiple high-quality and diverse title
candidates at a time for the developers to choose from. We build a large-scale
dataset with 890,000 question posts covering eight programming languages to
validate the effectiveness of M$_3$NSCT5. The automatic evaluation results on
the BLEU and ROUGE metrics demonstrate the superiority of M$_3$NSCT5 over six
state-of-the-art baseline models. Moreover, a human evaluation with trustworthy
results also demonstrates the great potential of our approach for real-world
application.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Uncovering Weaknesses in Neural Code Generation [21.552898575210534]
We assess the quality of generated code using match-based and execution-based metrics, then conduct thematic analysis to develop a taxonomy of nine types of weaknesses.
In the CoNaLa dataset, inaccurate prompts are a notable problem, causing all large models to fail in 26.84% of cases.
Missing pivotal semantics is a pervasive issue across benchmarks, with one or more large models omitting key semantics in 65.78% of CoNaLa tasks.
All models struggle with proper API usage, a challenge amplified by vague or complex prompts.
arXiv Detail & Related papers (2024-07-13T07:31:43Z) - Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking [5.874782446136913]
Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters.
Having high-quality question titles is an effective means to attract developers' attention.
Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions.
We present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking.
arXiv Detail & Related papers (2024-06-21T20:18:34Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle.
This paper proposes a novel solution called metamorphic prompt testing to address these challenges.
Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z) - Automatic Bi-modal Question Title Generation for Stack Overflow with
Prompt Learning [10.76882347665857]
An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body.
We propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body.
Our corpus includes 179,119 high-quality question posts for six popular programming languages.
arXiv Detail & Related papers (2024-03-06T12:58:25Z) - PPM: Automated Generation of Diverse Programming Problems for
Benchmarking Code Generation Models [10.491051578439722]
We propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets.
The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems.
arXiv Detail & Related papers (2024-01-28T02:27:38Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context
Reasoning with Language Models [58.41943058963672]
We propose a new inference framework called Recursion of Thought (RoT)
RoT introduces several special tokens that the models can output to trigger context-related operations.
Experiments with multiple architectures including GPT-3 show that RoT dramatically improves LMs' inference capability to solve problems.
arXiv Detail & Related papers (2023-06-12T06:34:16Z) - Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood.
Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only.
Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.