Related papers: Goal Recognition using Actor-Critic Optimization

Goal Recognition using Actor-Critic Optimization

URL: http://arxiv.org/abs/2501.01463v1
Date: Tue, 31 Dec 2024 16:44:20 GMT
Title: Goal Recognition using Actor-Critic Optimization
Authors: Ben Nageris, Felipe Meneguzzi, Reuth Mirsky,
Abstract summary: Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning.<n>DRACO is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference.<n>It achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches.
Score: 12.842382984993632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Goal Recognition aims to infer an agent's goal from a sequence of observations. Existing approaches often rely on manually engineered domains and discrete representations. Deep Recognition using Actor-Critic Optimization (DRACO) is a novel approach based on deep reinforcement learning that overcomes these limitations by providing two key contributions. First, it is the first goal recognition algorithm that learns a set of policy networks from unstructured data and uses them for inference. Second, DRACO introduces new metrics for assessing goal hypotheses through continuous policy representations. DRACO achieves state-of-the-art performance for goal recognition in discrete settings while not using the structured inputs used by existing approaches. Moreover, it outperforms these approaches in more challenging, continuous settings at substantially reduced costs in both computing and memory. Together, these results showcase the robustness of the new algorithm, bridging traditional goal recognition and deep reinforcement learning.

Related papers

Minimally Supervised Hierarchical Domain Intent Learning for CRS [0.0]
We propose an efficient solution for constructing a dynamic hierarchical structure that minimizes the number of user utterances required to achieve adequate domain knowledge coverage.<n>We apply our approach to a curated subset of 44,000 questions from the business food domain.
arXiv Detail & Related papers (2025-05-04T18:12:54Z)
Goal Recognition via Linear Programming [14.129476759815251]
Research on Goal Recognition as Planning encompasses reasoning about the model of a planning task, the observations, and the goals using planning techniques. In this article, we design novel recognition approaches that rely on the Operator-Counting framework. We show how the new IP/LP constraints can improve the recognition of goals under both partial and noisy observability.
arXiv Detail & Related papers (2024-04-11T17:34:35Z)
Progressive Conservative Adaptation for Evolving Target Domains [76.9274842289221]
Conventional domain adaptation typically transfers knowledge from a source domain to a stationary target domain. Restoring and adapting to such target data results in escalating computational and resource consumption over time. We propose a simple yet effective approach, termed progressive conservative adaptation (PCAda)
arXiv Detail & Related papers (2024-02-07T04:11:25Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z)
Why Target Networks Stabilise Temporal Difference Methods [38.35578010611503]
We show that under mild regularity conditions and a well tuned target network update frequency, convergence can be guaranteed. We conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update.
arXiv Detail & Related papers (2023-02-24T09:46:00Z)
Leveraging Planning Landmarks for Hybrid Online Goal Recognition [7.690707525070737]
We propose a hybrid method for online goal recognition that combines a symbolic planning landmark based approach and a data-driven goal recognition approach. The proposed method is significantly more efficient in terms of computation time than the state-of-the-art but also improves goal recognition performance.
arXiv Detail & Related papers (2023-01-25T13:21:30Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Goal Recognition as Reinforcement Learning [20.651718821998106]
We develop a framework that combines model-free reinforcement learning and goal recognition. This framework consists of two main stages: Offline learning of policies or utility functions for each potential goal, and online inference. The resulting instantiation achieves state-of-the-art performance against goal recognizers on standard evaluation domains and superior performance in noisy environments.
arXiv Detail & Related papers (2022-02-13T16:16:43Z)
Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations. We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning [5.406386303264086]
In either case, effective solutions require the agent to reliably reach a specified state. This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state. As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains. As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
arXiv Detail & Related papers (2020-02-15T23:46:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.