Related papers: Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant

URL: http://arxiv.org/abs/2502.01390v1
Date: Mon, 03 Feb 2025 14:23:22 GMT
Title: Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant
Authors: Gaole He, Gianluca Demartini, Ujwal Gadiraju,
Abstract summary: Large language models (LLMs) have continued to impact our everyday lives.<n>Recent work has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks.<n>We analyzed how user involvement at each stage affects their trust and collaborative team performance.
Score: 15.736792988697664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card payments). To ensure user agency and control over the LLM agent, we adopted LLM agents in a plan-then-execute manner, wherein the agents conducted step-wise planning and step-by-step execution in a simulation environment. We analyzed how user involvement at each stage affects their trust and collaborative team performance. Our findings demonstrate that LLM agents can be a double-edged sword -- (1) they can work well when a high-quality plan and necessary user involvement in execution are available, and (2) users can easily mistrust the LLM agents with plans that seem plausible. We synthesized key insights for using LLM agents as daily assistants to calibrate user trust and achieve better overall task outcomes. Our work has important implications for the future design of daily assistants and human-AI collaboration with LLM agents.

Related papers

Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments [9.128357856312372]
Large language models (LLMs) possess extensive knowledge bases and strong reasoning capabilities.<n>Existing LLM-based planning algorithms are limited by weak adaptation capabilities to multi-agent embodied scenarios.<n>We introduce a framework that enables LLM agents to learn and evolve both before and during test time.
arXiv Detail & Related papers (2025-06-08T17:32:03Z)
Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning [12.923902619187274]
This work studies how LLMs can adaptively collaborate to perform complex embodied reasoning tasks. MINDcraft is a platform built to enable LLM agents to control characters in the open-world game of Minecraft. An experimental study finds that the primary bottleneck in collaborating effectively for current state-of-the-art agents is efficient natural language communication.
arXiv Detail & Related papers (2025-04-24T21:28:16Z)
Self-Resource Allocation in Multi-Agent LLM Systems [17.125470138044978]
This paper explores how LLMs can effectively allocate computational tasks among multiple agents, considering factors such as cost, efficiency, and performance. Our experiments demonstrate that LLMs can achieve high validity and accuracy in resource allocation tasks. We find that the planner method outperforms the orchestrator method in handling concurrent actions, resulting in improved efficiency and better utilization of agents.
arXiv Detail & Related papers (2025-04-02T18:15:41Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
AGILE: A Novel Reinforcement Learning Framework of LLM Agents [7.982249117182315]
We introduce a novel reinforcement learning framework of LLM agents designed to perform complex conversational tasks with users. The agent possesses capabilities beyond conversation, including reflection, tool usage, and expert consultation. Our experiments show that AGILE agents based on 7B and 13B LLMs trained with PPO can outperform GPT-4 agents.
arXiv Detail & Related papers (2024-05-23T16:17:44Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments. We train a small RL agent in a mixture of the original and LLM-generated environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
LLM Augmented Hierarchical Agents [4.574041097539858]
Solving long-horizon, temporally-extended tasks using Reinforcement Learning (RL) is challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning) In this paper we exploit the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks. This approach is evaluated in simulation environments such as MiniGrid, SkillHack, and Crafter, and on a real robot arm in block manipulation tasks.
arXiv Detail & Related papers (2023-11-09T18:54:28Z)
AgentTuning: Enabling Generalized Agent Abilities for LLMs [35.74502545364593]
We present AgentTuning, a simple and general method to enhance the agent abilities of open large language models. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities.
arXiv Detail & Related papers (2023-10-19T15:19:53Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.