Related papers: On Unified Prompt Tuning for Request Quality Assurance in Public Code Review

On Unified Prompt Tuning for Request Quality Assurance in Public Code Review

URL: http://arxiv.org/abs/2404.07942v2
Date: Wed, 17 Apr 2024 14:04:50 GMT
Title: On Unified Prompt Tuning for Request Quality Assurance in Public Code Review
Authors: Xinyu Chen, Lin Li, Rui Zhang, Peng Liang,
Abstract summary: We propose a unified framework called UniPCR to complete developer-based request quality assurance (i.e., predicting request necessity and recommending tags subtask) under a Masked Language Model (MLM) Experimental results on the Public Code Review dataset for the time span 2011-2022 demonstrate that our UniPCR framework adapts to the two subtasks and outperforms comparable accuracy-based results with state-of-the-art methods for request quality assurance.
Score: 19.427661961488404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Public Code Review (PCR) can be implemented through a Software Question Answering (SQA) community, which facilitates high knowledge dissemination. Current methods mainly focus on the reviewer's perspective, including finding a capable reviewer, predicting comment quality, and recommending/generating review comments. Our intuition is that satisfying review necessity requests can increase their visibility, which in turn is a prerequisite for better review responses. To this end, we propose a unified framework called UniPCR to complete developer-based request quality assurance (i.e., predicting request necessity and recommending tags subtask) under a Masked Language Model (MLM). Specifically, we reformulate both subtasks via 1) text prompt tuning, which converts two subtasks into MLM by constructing prompt templates using hard prompt; 2) code prefix tuning, which optimizes a small segment of generated continuous vectors as the prefix of the code representation using soft prompt. Experimental results on the Public Code Review dataset for the time span 2011-2022 demonstrate that our UniPCR framework adapts to the two subtasks and outperforms comparable accuracy-based results with state-of-the-art methods for request quality assurance. These conclusions highlight the effectiveness of our unified framework from the developer's perspective in public code review.

Related papers

Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [35.686125031177234]
Multi-Document Summarization (MDS) is a challenging task that focuses on extracting and synthesizing useful information from multiple lengthy documents. We propose a novel framework that leverages inference-time scaling for this task. We also introduce two new evaluation metrics: Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review [15.019556560416403]
Public Code Review (PCR) is an assistant to the internal code review of the development team. We propose a Knowledge-guided Prompt learning for Public Code Review to achieve developer-based code review request quality assurance.
arXiv Detail & Related papers (2024-10-29T02:48:41Z)
Trust but Verify: Programmatic VLM Evaluation in the Wild [62.14071929143684]
Programmatic VLM Evaluation (PROVE) is a new benchmarking paradigm for evaluating VLM responses to open-ended queries. We benchmark the helpfulness-truthfulness trade-offs of a range ofVLMs on PROVE, finding that very few are in-fact able to achieve a good balance between the two.
arXiv Detail & Related papers (2024-10-17T01:19:18Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
MORCoRA: Multi-Objective Refactoring Recommendation Considering Review Availability [6.439206681270567]
It is essential to ensure that the searched sequence of sequences can be reviewed promptly. We propose MORCoRA, a multi-objective search-based technique that can search for code quality, semantic preserved, and high review availability.
arXiv Detail & Related papers (2024-08-13T02:08:16Z)
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance [62.15866177242207]
We show that through constructing a subject-agnostic condition, one could obtain outputs consistent with both the given subject and input text prompts. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements.
arXiv Detail & Related papers (2024-05-02T15:03:41Z)
PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition [4.125007507808684]
This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features.
arXiv Detail & Related papers (2024-04-20T07:05:45Z)
Code Reviewer Recommendation Based on a Hypergraph with Multiplex Relationships [30.74556500021384]
We present MIRRec, a novel code reviewer recommendation method that leverages a hypergraph with multiplex relationships. MIRRec encodes high-order correlations that go beyond traditional pairwise connections using degree-free hyperedges among pull requests and developers. To validate the effectiveness of MIRRec, we conducted experiments using a dataset comprising 48,374 pull requests from ten popular open-source software projects hosted on GitHub.
arXiv Detail & Related papers (2024-01-19T15:25:14Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three) We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.