Curiosity by Design: An LLM-based Coding Assistant Asking Clarification Questions
- URL: http://arxiv.org/abs/2507.21285v1
- Date: Mon, 28 Jul 2025 19:10:57 GMT
- Title: Curiosity by Design: An LLM-based Coding Assistant Asking Clarification Questions
- Authors: Harsh Darji, Thibaud Lutellier,
- Abstract summary: This work aims to build an LLM-based coding assistant that mimics the human code review process.<n>Our end-to-end system includes (1) a query trained to detect unclear programming-related queries and (2) a fine-tuned LLM that generates clarification questions.
- Score: 4.2689345057164205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly used as coding assistants. However, the ambiguity of the developer's prompt often leads to incorrect code generation, as current models struggle to infer user intent without extensive prompt engineering or external context. This work aims to build an LLM-based coding assistant that mimics the human code review process by asking clarification questions when faced with ambiguous or under-specified queries. Our end-to-end system includes (1) a query classifier trained to detect unclear programming-related queries and (2) a fine-tuned LLM that generates clarification questions. Our evaluation shows that the fine-tuned LLM outperforms standard zero-shot prompting in generating useful clarification questions. Furthermore, our user study indicates that users find the clarification questions generated by our model to outperform the baseline, demonstrating that our coding assistant produces more accurate and helpful code responses compared to baseline coding assistants.
Related papers
- Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z) - ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving [3.683434365857386]
We present ClarifyCoder, a novel framework with synthetic data generation and instruction-tuning.<n>We argue that the fundamental ability to recognize and query ambiguous requirements should be intrinsic to the models themselves.<n>Our approach consists of two main components: (1) a data synthesis technique that augments existing programming datasets with scenarios requiring clarification to generate clarification-aware training data, and (2) a fine-tuning strategy that teaches models to prioritize seeking clarification over immediate code generation when faced with incomplete or ambiguous requirements.
arXiv Detail & Related papers (2025-04-23T00:34:39Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - HumanEvalComm: Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent [2.8391355909797644]
Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation.<n>There is still a gap between LLMs being capable coders and being top-tier software engineers.
arXiv Detail & Related papers (2024-05-31T22:06:18Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Why and When LLM-Based Assistants Can Go Wrong: Investigating the
Effectiveness of Prompt-Based Interactions for Software Help-Seeking [5.755004576310333]
Large Language Model (LLM) assistants have emerged as potential alternatives to search methods for helping users navigate software.
LLM assistants use vast training data from domain-specific texts, software manuals, and code repositories to mimic human-like interactions.
arXiv Detail & Related papers (2024-02-12T19:49:58Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - Fixing Large Language Models' Specification Misunderstanding for Better Code Generation [13.494822086550604]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)<n>It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.<n>muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - Large Language Models Should Ask Clarifying Questions to Increase
Confidence in Generated Code [0.7252027234425334]
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation.
There is still a gap between LLMs being capable coders and being top-tier software engineers.
I propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code.
arXiv Detail & Related papers (2023-08-25T17:33:05Z) - Using an LLM to Help With Code Understanding [13.53616539787915]
Large language models (LLMs) are revolutionizing the process of writing code.
Our plugin queries OpenAI's GPT-3.5-turbo model with four high-level requests without the user having to write explicit prompts.
We evaluate this system in a user study with 32 participants, which confirms that using our plugin can aid task completion more than web search.
arXiv Detail & Related papers (2023-07-17T00:49:06Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.