Related papers: AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Related papers

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions [110.43343503158306]
This paper embeds the manual-assisted task into a vision-language-action framework, where the assistant provides services to the instructor following egocentric vision and commands.<n>Under this setting, we accomplish InterVLA, the first large-scale human-object-human interaction dataset with 11.4 hours and 1.2M frames of multimodal data.<n>We establish novel benchmarks on egocentric human motion estimation, interaction synthesis, and interaction prediction with comprehensive analysis.
arXiv Detail & Related papers (2025-08-06T17:46:23Z)
Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models [50.19518681574399]
A central challenge in real-world assistive teleoperation is for the robot to infer a wide range of human intentions from user control inputs.<n>We introduce Casper, an assistive teleoperation system that leverages commonsense knowledge embedded in pre-trained visual language models.<n>We show that Casper improves task performance, reduces human cognitive load, and achieves higher user satisfaction than direct teleoperation and assistive teleoperation baselines.
arXiv Detail & Related papers (2025-06-17T17:06:43Z)
Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning [42.09560737219404]
Large Language Models show promise in human-like communication, but their standalone use is hindered by memory constraints and contextual incoherence. This work presents a multimodal, cognitively inspired framework that enhances LLM-based autonomous decision-making in social and task-oriented Human-Robot Interaction. To further enhance autonomy and personalization, we introduce a memory system for selecting, storing and retrieving experiences.
arXiv Detail & Related papers (2025-04-02T10:45:41Z)
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant [15.736792988697664]
Large language models (LLMs) have continued to impact our everyday lives.<n>Recent work has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks.<n>We analyzed how user involvement at each stage affects their trust and collaborative team performance.
arXiv Detail & Related papers (2025-02-03T14:23:22Z)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
Effect of Adaptive Communication Support on LLM-powered Human-Robot Collaboration [2.4552201513604093]
Human-Robot Teaming Framework with Multi-Modal Language feedback (HRT-ML) HRT-ML framework includes two core modules: a Coordinator for high-level, low-frequency strategic guidance, and a Manager for subtask-specific, high-frequency instructions.
arXiv Detail & Related papers (2024-11-26T00:06:47Z)
Robotic warehousing operations: a learn-then-optimize approach to large-scale neighborhood search [84.39855372157616]
This paper supports robotic parts-to-picker operations in warehousing by optimizing order-workstation assignments, item-pod assignments and the schedule of order fulfillment at workstations. We solve it via large-scale neighborhood search, with a novel learn-then-optimize approach to subproblem generation. In collaboration with Amazon Robotics, we show that our model and algorithm generate much stronger solutions for practical problems than state-of-the-art approaches.
arXiv Detail & Related papers (2024-08-29T20:22:22Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.<n>WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts. We leverage natural language prompts and contextual information from the Robot Operating System (ROS) Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z)
Adaptive In-conversation Team Building for Language Model Agents [33.03550687362213]
Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks.<n>Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent.<n>A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods.
arXiv Detail & Related papers (2024-05-29T18:08:37Z)
Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality [28.27036270001756]
This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment.
arXiv Detail & Related papers (2024-05-16T14:20:30Z)
AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation [0.0]
Autonomous User-interface Transformation through Online Neuro-graphic Operations and Deep Exploration. Our engine empowers agents to comprehend and implement complex, adapting to dynamic web environments with unparalleled efficiency. The versatility and efficacy of AUTONODE are demonstrated through a series of experiments, highlighting its proficiency in managing a diverse array of web-based tasks.
arXiv Detail & Related papers (2024-03-15T10:27:17Z)
Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server. We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z)
HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World [48.90399899928823]
This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. We introduce HoloAssist, a large-scale egocentric human interaction dataset. We present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment.
arXiv Detail & Related papers (2023-09-29T07:17:43Z)
Building Cooperative Embodied Agents Modularly with Large Language Models [104.57849816689559]
We address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. We harness the commonsense knowledge, reasoning ability, language comprehension, and text generation prowess of LLMs and seamlessly incorporate them into a cognitive-inspired modular framework. Our experiments on C-WAH and TDW-MAT demonstrate that CoELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
arXiv Detail & Related papers (2023-07-05T17:59:27Z)
RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks [70.69063219750952]
We present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN) ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks.
arXiv Detail & Related papers (2023-06-30T20:35:22Z)
Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis [4.055489363682198]
Large language models (LLMs) offer significant promise as a knowledge source for task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks. We describe a cognitive-agent approach, STARS, that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences.
arXiv Detail & Related papers (2023-06-11T20:50:14Z)
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback [42.19685958922537]
We argue that human-AI collaboration should be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize. In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world.
arXiv Detail & Related papers (2023-04-21T05:37:59Z)
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society [58.04479313658851]
This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents. We propose a novel communicative agent framework named role-playing. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems.
arXiv Detail & Related papers (2023-03-31T01:09:00Z)
A Unified Architecture for Dynamic Role Allocation and Collaborative Task Planning in Mixed Human-Robot Teams [0.0]
We present a novel architecture for dynamic role allocation and collaborative task planning in a mixed human-robot team of arbitrary size. The architecture capitalizes on a centralized reactive and modular task-agnostic planning method based on Behavior Trees (BTs) Different metrics used as MILP cost allow the architecture to favor various aspects of the collaboration.
arXiv Detail & Related papers (2023-01-19T12:30:56Z)
Towards a Multi-purpose Robotic Nursing Assistant [0.0]
Multi-purpose Intelligent Nurse Aid (MINA) robotic system is capable of providing walking assistance to the patients and perform teleoperation tasks with an easy-to-use and intuitive Graphical User Interface (GUI) This paper presents preliminary results from the walking assistant task that improves upon the current state-of-the-art methods and shows the developed GUI for teleoperation.
arXiv Detail & Related papers (2021-06-07T15:00:12Z)
Towards an AI assistant for human grid operators [59.535699822923]
Power systems are becoming more complex to operate in the digital age. Real-time decision-making is getting more challenging as the human operator has to deal with more information. There is a great need for rethinking the human-machine interface under more unified and interactive frameworks.
arXiv Detail & Related papers (2020-12-03T16:12:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.