Related papers: Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

URL: http://arxiv.org/abs/2406.15633v1
Date: Fri, 21 Jun 2024 20:18:34 GMT
Title: Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking
Authors: Duc Anh Le, Anh M. T. Bui, Phuong T. Nguyen, Davide Di Ruscio,
Abstract summary: Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions. We present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking.
Score: 5.874782446136913
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions. Yet, getting high-quality titles is still a challenging task, attributed to both the quality of the input data (e.g., containing noise and ambiguity) and inherent constraints in sequence generation models. In this paper, we present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking. Our study focuses on enhancing pre-trained language models for generating titles for Stack Overflow posts, employing a training and subsequent fine-tuning paradigm for these models. To this end, we integrate the model's predictions into the training process, enabling it to learn from its errors, thereby lessening the effects of exposure bias. Moreover, we apply a post-ranking method to produce a variety of sample candidates, subsequently selecting the most suitable one. To evaluate FILLER, we perform experiments using benchmark datasets, and the empirical findings indicate that our model provides high-quality recommendations. Moreover, it significantly outperforms all the baselines, including Code2Que, SOTitle, CCBERT, M3NSCT5, and GPT3.5-turbo. A user study also shows that FILLER provides more relevant titles, with respect to SOTitle and GPT3.5-turbo.

Related papers

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique [59.18475981916166]
We introduce OpenCodeReasoning-II, a dataset consisting of 2.5M question-solution-critique triples (approx. 35K unique programming questions)<n>In this work, we employ a two-stage supervised fine-tuning strategy. The first stage focuses on fine-tuning for code generation, while the second stage involves the joint training of models for both code generation and critique. Notably, the integration of our code generation and critique models leads to significant improvements in competitive coding performance.
arXiv Detail & Related papers (2025-07-11T23:35:54Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education [24.970741456147447]
Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, achieving near-perfect performance on benchmarks like GSM8K. However, their application in personalized education remains limited due to an overemphasis on correctness over error diagnosis and feedback generation. We introduce textbfMathCCS, a benchmark designed for systematic error analysis and tailored feedback. Second, we develop a sequential error analysis framework that leverages historical data to track trends and improve diagnostic precision. Third, we propose a multi-agent collaborative framework that combines a Time Series Agent for historical analysis and an MLLM Agent for real-
arXiv Detail & Related papers (2025-02-19T14:57:51Z)
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training [94.14908801708049]
We introduce T"ULU 3, a family of fully-open state-of-the-art post-trained models. T"ULU 3 builds on Llama 3.1 base models, achieves results surpassing the instruct versions of Llama 3.1, Qwen 2.5, Mistral, and even closed models such as GPT-4o-mini and Claude 3.5-Haiku.
arXiv Detail & Related papers (2024-11-22T18:44:04Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation [97.3797716862478]
Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label. This work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence.
arXiv Detail & Related papers (2024-07-29T15:07:19Z)
Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning [10.76882347665857]
An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. We propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Our corpus includes 179,119 high-quality question posts for six popular programming languages.
arXiv Detail & Related papers (2024-03-06T12:58:25Z)
Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation [25.317788211120362]
We investigate the role of generative AI models in providing human tutor-style programming hints. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios. We develop a novel technique, GPT4Hints-GPT3.5Val, to push the limits of generative AI models.
arXiv Detail & Related papers (2023-10-05T17:02:59Z)
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting. The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z)
Representation Learning for Stack Overflow Posts: How Far are We? [14.520780251680586]
State-of-the-art Stack Overflow post representation models are Post2Vec and BERTOverflow. Despite their promising results, these representation methods have not been evaluated in the same experimental setting. We propose SOBERT, which employs a simple-yet-effective strategy to improve the best-performing model.
arXiv Detail & Related papers (2023-03-13T04:49:06Z)
Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer [11.03785369838242]
We propose M$_3$NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets. M$_3$NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding. We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M$_3$NSCT5.
arXiv Detail & Related papers (2022-08-24T13:10:48Z)
DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem. The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network. To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text. We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness. Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.