Related papers: RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains

RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains

URL: http://arxiv.org/abs/2506.15756v1
Date: Wed, 18 Jun 2025 11:30:52 GMT
Title: RecBayes: Recurrent Bayesian Ad Hoc Teamwork in Large Partially Observable Domains
Authors: João G. Ribeiro, Yaniv Oren, Alberto Sardinha, Matthijs Spaan, Francisco S. Melo,
Abstract summary: RecBayes is a novel approach for ad hoc teamwork under partial observability.<n>We show RecBayes is effective at identifying known teams and tasks being performed from partial observations alone.
Score: 3.308833414816073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes RecBayes, a novel approach for ad hoc teamwork under partial observability, a setting where agents are deployed on-the-fly to environments where pre-existing teams operate, that never requires, at any stage, access to the states of the environment or the actions of its teammates. We show that by relying on a recurrent Bayesian classifier trained using past experiences, an ad hoc agent is effectively able to identify known teams and tasks being performed from observations alone. Unlike recent approaches such as PO-GPL (Gu et al., 2021) and FEAT (Rahman et al., 2023), that require at some stage fully observable states of the environment, actions of teammates, or both, or approaches such as ATPO (Ribeiro et al., 2023) that require the environments to be small enough to be tabularly modelled (Ribeiro et al., 2023), in their work up to 4.8K states and 1.7K observations, we show RecBayes is both able to handle arbitrarily large spaces while never relying on either states and teammates' actions. Our results in benchmark domains from the multi-agent systems literature, adapted for partial observability and scaled up to 1M states and 2^125 observations, show that RecBayes is effective at identifying known teams and tasks being performed from partial observations alone, and as a result, is able to assist the teams in solving the tasks effectively.

Related papers

Improving Zero-Shot ObjectNav with Generative Communication [60.84730028539513]
We propose a new method for improving zero-shot ObjectNav. Our approach takes into account that the ground agent may have limited and sometimes obstructed view.
arXiv Detail & Related papers (2024-08-03T22:55:26Z)
Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge [93.4434417387526]
We propose Open Vocabulary Mobile Manipulation as a key benchmark task for robotics. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. We detail the results and methodologies used, both in simulation and real-world settings.
arXiv Detail & Related papers (2024-07-09T15:15:01Z)
Predicting the Intention to Interact with a Service Robot:the Role of Gaze Cues [51.58558750517068]
Service robots need to perceive as early as possible that an approaching person intends to interact. We solve this perception task with a sequence-to-sequence classifier of a potential user intention to interact. Our main contribution is a study of the benefit of features representing the person's gaze in this context.
arXiv Detail & Related papers (2024-04-02T14:22:54Z)
Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability [11.786470737937638]
This paper introduces a formal definition of the setting of ad hoc teamwork under partial observability. Our results in 70 POMDPs from 11 domains show that our approach is not only effective in assisting unknown teammates in solving unknown tasks but is also robust in scaling to more challenging problems.
arXiv Detail & Related papers (2023-09-30T16:40:50Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning [11.998708550268978]
We develop a class of solutions for open ad hoc teamwork under full and partial observability. We show that our solution can learn efficient policies in open ad hoc teamwork in fully and partially observable cases.
arXiv Detail & Related papers (2022-10-11T13:44:44Z)
Assisting Unknown Teammates in Unknown Tasks: Ad Hoc Teamwork under Partial Observability [15.995282665634097]
We present a novel online prediction algorithm for the problem setting of ad hoc teamwork under partial observability (ATPO) ATPO accommodates partial observability, using the agent's observations to identify which task is being performed by the teammates. Our results show that ATPO is effective and robust in identifying the teammate's task from a large library of possible tasks, efficient at solving it in near-optimal time, and scalable in adapting to increasingly larger problem sizes.
arXiv Detail & Related papers (2022-01-10T18:53:34Z)
Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation. We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment. Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z)
Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.