Related papers: Program Synthesis Dialog Agents for Interactive Decision-Making

Program Synthesis Dialog Agents for Interactive Decision-Making

URL: http://arxiv.org/abs/2502.19610v2
Date: Mon, 17 Mar 2025 18:13:03 GMT
Title: Program Synthesis Dialog Agents for Interactive Decision-Making
Authors: Matthew Toles, Nikhil Balwani, Rattandeep Singh, Valentina Giulia Sartori Rodriguez, Zhou Yu,
Abstract summary: We propose BeNYfits, a new benchmark for determining user eligibility for social benefits opportunities through interactive decision-making.<n>Our experiments show that GPT-4o scoring only 35.7 F1 using a ReAct-style chain-of-thought.<n>Our agent, ProADA, improves the F1 score to 55.6 while maintaining nearly the same number of dialog turns.
Score: 15.76727860626721
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many real-world eligibility problems, ranging from medical diagnosis to tax planning, can be mapped to decision problems expressed in natural language, wherein a model must make a binary choice based on user features. Large-scale domains such as legal codes or frequently updated funding opportunities render human annotation (e.g., web forms or decision trees) impractical, highlighting the need for agents that can automatically assist in decision-making. Since relevant information is often only known to the user, it is crucial that these agents ask the right questions. As agents determine when to terminate a conversation, they face a trade-off between accuracy and the number of questions asked, a key metric for both user experience and cost. To evaluate this task, we propose BeNYfits, a new benchmark for determining user eligibility for multiple overlapping social benefits opportunities through interactive decision-making. Our experiments show that current language models struggle with frequent hallucinations, with GPT-4o scoring only 35.7 F1 using a ReAct-style chain-of-thought. To address this, we introduce ProADA, a novel approach that leverages program synthesis to assist in decision-making by mapping dialog planning to a code generation problem and using gaps in structured data to determine the best next action. Our agent, ProADA, improves the F1 score to 55.6 while maintaining nearly the same number of dialog turns.

Related papers

A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z)
From Guessing to Asking: An Approach to Resolving the Persona Knowledge Gap in LLMs during Multi-Turn Conversations [11.958380211411386]
This study introduces the persona knowledge gap, the discrepancy between a model's internal understanding and the knowledge required for coherent, personalized conversations. We propose Conversation Preference Elicitation and Recommendation (CPER), a novel framework that dynamically detects and resolves persona knowledge gaps. CPER consists of three key modules: a Contextual Understanding Module for preference extraction, a Dynamic Feedback Module for measuring uncertainty and refining persona alignment, and a Persona-Driven Response Generation module for adapting responses based on accumulated user context.
arXiv Detail & Related papers (2025-03-16T15:55:29Z)
ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents [11.118991548784459]
Large language model (LLM)-based agents have been increasingly used to interact with external environments. Current frameworks do not enable these agents to work with users and interact with them to align on the details of their tasks. This work introduces ReSpAct, a novel framework that combines the essential skills for building task-oriented "conversational" agents.
arXiv Detail & Related papers (2024-11-01T15:57:45Z)
Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models [19.131389732699365]
Service chatbots play an important role in enhancing customer support by delivering timely responses to diverse queries. To effectively handle varied customer inquiries, augmenting the knowledge base with similar questions that maintain semantic consistency and linguistic variability is crucial. This paper presents methodologies for a novel approach that utilizes Large Language Models for generating similar questions and selecting an optimal subset of questions for knowledge base augmentation.
arXiv Detail & Related papers (2024-10-16T10:48:14Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)<n>It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.<n>The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers [1.3781777926017094]
Large Language Models (LLMs) have been applied to build several automation and personalized question-answering prototypes so far. We present a novel Langchain-based framework that transforms data tables into hierarchical textual data chunks to enable a wide variety of actionable question answering.
arXiv Detail & Related papers (2023-11-09T22:53:52Z)
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities. We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework. We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z)
Decision-Oriented Dialogue for Human-AI Collaboration [62.367222979251444]
We describe a class of tasks called decision-oriented dialogues, in which AI assistants such as large language models (LMs) must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach.
arXiv Detail & Related papers (2023-05-31T17:50:02Z)
Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models [116.25562358482962]
State-of-the-art neural language models can be used to solve ad-hoc language tasks without the need for supervised training. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts.
arXiv Detail & Related papers (2022-08-16T17:17:53Z)
INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions [47.90088587508672]
InSCIt is a dataset for Information-Seeking Conversations with mixed-initiative Interactions. It contains 4.7K user-agent turns from 805 human-human conversations. We report results of two systems based on state-of-the-art models of conversational knowledge identification and open-domain question answering.
arXiv Detail & Related papers (2022-07-02T06:18:12Z)
Partner Matters! An Empirical Study on Fusing Personas for Personalized Response Selection in Retrieval-Based Chatbots [51.091235903442715]
This paper makes an attempt to explore the impact of utilizing personas that describe either self or partner speakers on the task of response selection. Four persona fusion strategies are designed, which assume personas interact with contexts or responses in different ways. Empirical studies on the Persona-Chat dataset show that the partner personas can improve the accuracy of response selection.
arXiv Detail & Related papers (2021-05-19T10:32:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.