Signal Temporal Logic-Guided Apprenticeship Learning
- URL: http://arxiv.org/abs/2311.05084v1
- Date: Thu, 9 Nov 2023 00:59:28 GMT
- Title: Signal Temporal Logic-Guided Apprenticeship Learning
- Authors: Aniruddh G. Puranic, Jyotirmoy V. Deshmukh and Stefanos Nikolaidis
- Abstract summary: We show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric.
We show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.
- Score: 6.8500997328311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Apprenticeship learning crucially depends on effectively learning rewards,
and hence control policies from user demonstrations. Of particular difficulty
is the setting where the desired task consists of a number of sub-goals with
temporal dependencies. The quality of inferred rewards and hence policies are
typically limited by the quality of demonstrations, and poor inference of these
can lead to undesirable outcomes. In this letter, we show how temporal logic
specifications that describe high level task objectives, are encoded in a graph
to define a temporal-based metric that reasons about behaviors of demonstrators
and the learner agent to improve the quality of inferred rewards and policies.
Through experiments on a diverse set of robot manipulator simulations, we show
how our framework overcomes the drawbacks of prior literature by drastically
improving the number of demonstrations required to learn a control policy.
Related papers
- When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering [10.01278648231868]
Policy steering is an emerging way to adapt robot behaviors at deployment-time.<n> Vision-Language Models (VLMs) are promising general-purpose verifiers due to their reasoning capabilities.<n>We propose uncertainty-aware policy steering (UPS), a framework that jointly reasons about semantic task uncertainty and low-level action feasibility.
arXiv Detail & Related papers (2026-02-25T23:23:22Z) - Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training [76.12556589212666]
We show that curriculum post-training avoids the exponential complexity bottleneck.<n>Under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with sample complexity.<n>We establish guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to order.
arXiv Detail & Related papers (2025-11-10T18:29:54Z) - Learning Affordances at Inference-Time for Vision-Language-Action Models [50.93181349331096]
In robotics, Vision-Language-Action models (VLAs) offer a promising path towards solving complex control tasks.<n>We introduce Learning from Inference-Time Execution (LITEN), which connects a VLA low-level policy to a high-level VLM that conditions on past experiences.<n>Our approach iterates between a reasoning phase that generates and executes plans for the low-level VLA, and an assessment phase that reflects on the resulting execution.
arXiv Detail & Related papers (2025-10-22T16:43:29Z) - Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations [8.436020932157684]
Smooth Policy Regularisation from Demonstrations (SPReD) is a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy?<n>SPReD uses ensemble methods to explicitly model Q-value distributions for both demonstration and policy actions, quantifying uncertainty for comparisons.<n>It achieves remarkable gains in experiments across eight robotics tasks, outperforming existing approaches by up to a factor of 14 in complex tasks while maintaining robustness to demonstration quality and quantity.
arXiv Detail & Related papers (2025-09-19T13:47:20Z) - Reinforcement Learning via Implicit Imitation Guidance [49.88208134736617]
A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy.<n>We propose to use prior data solely for guiding exploration via noise added to the policy, sidestepping the need for explicit behavior cloning constraints.<n>Our approach achieves up to 2-3x improvement over prior reinforcement learning from offline methods across seven simulated continuous control tasks.
arXiv Detail & Related papers (2025-06-09T07:32:52Z) - Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from
Demonstrations [15.762916270583698]
Learning from demonstration (LfD) methods have shown promise for solving multi-step tasks.
In this work, we identify the roots of such a challenge as the failure of the learned continuous policy to satisfy the discrete plan implicit in the demonstration.
We prove our learned continuous policy can simulate any discrete plan specified by a Linear Temporal Logic (LTL) formula.
arXiv Detail & Related papers (2022-06-09T17:25:22Z) - Accelerated Reinforcement Learning for Temporal Logic Control Objectives [10.216293366496688]
This paper addresses the problem of learning control policies for mobile robots modeled as unknown Markov Decision Processes (MDPs)
We propose a novel accelerated model-based reinforcement learning (RL) algorithm for control objectives that is capable of learning control policies significantly faster than related approaches.
arXiv Detail & Related papers (2022-05-09T17:09:51Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Learning from Demonstrations using Signal Temporal Logic [1.2182193687133713]
Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
arXiv Detail & Related papers (2021-02-15T18:28:36Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.