Related papers: Interactive Agents to Overcome Ambiguity in Software Engineering

Interactive Agents to Overcome Ambiguity in Software Engineering

URL: http://arxiv.org/abs/2502.13069v1
Date: Tue, 18 Feb 2025 17:12:26 GMT
Title: Interactive Agents to Overcome Ambiguity in Software Engineering
Authors: Sanidhya Vijayvargiya, Xuhui Zhou, Akhila Yerukola, Maarten Sap, Graham Neubig,
Abstract summary: AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
Score: 61.40183840499932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes, safety risks due to tool misuse, and wasted computational resources. In this work, we study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance across three key steps: (a) leveraging interactivity to improve performance in ambiguous scenarios, (b) detecting ambiguity, and (c) asking targeted questions. Our findings reveal that models struggle to distinguish between well-specified and underspecified instructions. However, when models interact for underspecified inputs, they effectively obtain vital information from the user, leading to significant improvements in performance and underscoring the value of effective interaction. Our study highlights critical gaps in how current state-of-the-art models handle ambiguity in complex software engineering tasks and structures the evaluation into distinct steps to enable targeted improvements.

Related papers

When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions [23.5858385520752]
Large Language Models (LLMs) have demonstrated impressive performance in code generation tasks under idealized conditions.<n>In practice, task descriptions frequently exhibit ambiguity, incompleteness, or internal contradictions.<n>We present the first empirical study examining the robustness of state-of-the-art code generation models when faced with such unclear task descriptions.
arXiv Detail & Related papers (2025-07-27T23:16:14Z)
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories [18.129031749321058]
Large Language Model (LLM)-based agents are increasingly employed to automate complex software engineering tasks.<n>Despite their widespread adoption, the internal decision-making processes of these agents remain largely unexplored.<n>We present a large-scale empirical study of the thought-action-result trajectories of three state-of-the-art LLM-based agents.
arXiv Detail & Related papers (2025-06-23T16:34:52Z)
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification [25.27444694706659]
We present AskToAct, which exploits structural mapping between queries and their tool invocation solutions. Our key insight is that tool parameters naturally represent explicit user intents. By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data.
arXiv Detail & Related papers (2025-03-03T12:55:49Z)
Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z)
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search [89.97082652805904]
We propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value. We empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis.
arXiv Detail & Related papers (2025-02-04T18:58:31Z)
Eliciting Causal Abilities in Large Language Models for Reasoning Tasks [14.512834333917414]
We introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data.<n>In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language.<n>Our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts.
arXiv Detail & Related papers (2024-12-19T17:03:02Z)
Explainable Procedural Mistake Detection [27.40806437649092]
Procedural mistake detection is a challenging sub-problem of classifying whether a human user has successfully executed the task at hand.<n>We recast PMD to an explanatory self-dialog of questions and answers, which serve as evidence for a decision.<n>Our results show that while open-source VLMs struggle with this task off-the-shelf, their accuracy, coherence, and dialog efficiency can be vastly improved.
arXiv Detail & Related papers (2024-12-16T16:13:55Z)
Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models. The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.