Learning rewards for robotic ultrasound scanning using probabilistic
temporal ranking
- URL: http://arxiv.org/abs/2002.01240v3
- Date: Thu, 22 Jun 2023 19:19:02 GMT
- Title: Learning rewards for robotic ultrasound scanning using probabilistic
temporal ranking
- Authors: Michael Burke, Katie Lu, Daniel Angelov, Art\=uras Strai\v{z}ys, Craig
Innes, Kartic Subr, Subramanian Ramamoorthy
- Abstract summary: This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from example demonstrations.
Many existing reward inference strategies are unsuited to this class of problems, due to the exploratory nature of the demonstrations.
We formalise this emphprobabilistic temporal ranking approach and show that it improves upon existing approaches.
- Score: 17.494224125794187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Informative path-planning is a well established approach to visual-servoing
and active viewpoint selection in robotics, but typically assumes that a
suitable cost function or goal state is known. This work considers the inverse
problem, where the goal of the task is unknown, and a reward function needs to
be inferred from exploratory example demonstrations provided by a demonstrator,
for use in a downstream informative path-planning policy. Unfortunately, many
existing reward inference strategies are unsuited to this class of problems,
due to the exploratory nature of the demonstrations. In this paper, we propose
an alternative approach to cope with the class of problems where these
sub-optimal, exploratory demonstrations occur. We hypothesise that, in tasks
which require discovery, successive states of any demonstration are
progressively more likely to be associated with a higher reward, and use this
hypothesis to generate time-based binary comparison outcomes and infer reward
functions that support these ranks, under a probabilistic generative model. We
formalise this \emph{probabilistic temporal ranking} approach and show that it
improves upon existing approaches to perform reward inference for autonomous
ultrasound scanning, a novel application of learning from demonstration in
medical imaging while also being of value across a broad range of goal-oriented
learning from demonstration tasks. \keywords{Visual servoing \and reward
inference \and probabilistic temporal ranking
Related papers
- Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans [9.600625243282618]
We study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time.
We present a novel approach to address these open problems using (i) a screw geometric representation to generate manipulation plans from demonstrations.
We present experimental results on two example manipulation tasks, namely, pouring and scooping, to illustrate our approach.
arXiv Detail & Related papers (2024-10-23T20:57:56Z) - Reward Collapse in Aligning Large Language Models [64.98482888193267]
We study the phenomenon of textitreward collapse', an empirical observation where the prevailing ranking-based approach results in an textitidentical reward distribution.
Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.
arXiv Detail & Related papers (2023-05-28T02:12:00Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Replacing Rewards with Examples: Example-Based Policy Search via
Recursive Classification [133.20816939521941]
In the standard Markov decision process formalism, users specify tasks by writing down a reward function.
In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved.
Motivated by this observation, we derive a control algorithm that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states.
arXiv Detail & Related papers (2021-03-23T16:19:55Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Active Preference-Based Gaussian Process Regression for Reward Learning [42.697198807877925]
One common approach is to learn reward functions from collected expert demonstrations.
We present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories.
Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.
arXiv Detail & Related papers (2020-05-06T03:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.