Transductive Reward Inference on Graph
- URL: http://arxiv.org/abs/2402.03661v1
- Date: Tue, 6 Feb 2024 03:31:28 GMT
- Title: Transductive Reward Inference on Graph
- Authors: Bohao Qu, Xiaofeng Cao, Qing Guo, Yi Chang, Ivor W. Tsang, Chengqi
Zhang
- Abstract summary: We develop a reward inference method based on the contextual properties of information propagation on graphs.
We leverage both the available data and limited reward annotations to construct a reward propagation graph.
We employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data.
- Score: 53.003245457089406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we present a transductive inference approach on that reward
information propagation graph, which enables the effective estimation of
rewards for unlabelled data in offline reinforcement learning. Reward inference
is the key to learning effective policies in practical scenarios, while direct
environmental interactions are either too costly or unethical and the reward
functions are rarely accessible, such as in healthcare and robotics. Our
research focuses on developing a reward inference method based on the
contextual properties of information propagation on graphs that capitalizes on
a constrained number of human reward annotations to infer rewards for
unlabelled data. We leverage both the available data and limited reward
annotations to construct a reward propagation graph, wherein the edge weights
incorporate various influential factors pertaining to the rewards.
Subsequently, we employ the constructed graph for transductive reward
inference, thereby estimating rewards for unlabelled data. Furthermore, we
establish the existence of a fixed point during several iterations of the
transductive inference process and demonstrate its at least convergence to a
local optimum. Empirical evaluations on locomotion and robotic manipulation
tasks validate the effectiveness of our approach. The application of our
inferred rewards improves the performance in offline reinforcement learning
tasks.
Related papers
- Perturbation-based Graph Active Learning for Weakly-Supervised Belief Representation Learning [13.311498341765772]
The objective is to strategically identify valuable messages on social media graphs that are worth labeling within a constrained budget.
This paper proposes a graph data augmentation-inspired active learning strategy (PerbALGraph) that progressively selects messages for labeling.
arXiv Detail & Related papers (2024-10-24T22:11:06Z) - Debiasing Graph Representation Learning based on Information Bottleneck [18.35405511009332]
We present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder.
The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest.
Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.
arXiv Detail & Related papers (2024-09-02T16:45:23Z) - Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement
Learning with Sub-optimal Demonstrations [25.536792010283566]
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations.
We introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework.
Our framework demonstrates significant performance improvements over previous SOTA methods.
arXiv Detail & Related papers (2023-10-13T02:38:35Z) - Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement [42.45888600367566]
Directed generation aims to generate samples with desired properties as measured by a reward function.
We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels.
arXiv Detail & Related papers (2023-07-13T20:20:40Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Invariance in Policy Optimisation and Partial Identifiability in Reward
Learning [67.4640841144101]
We characterise the partial identifiability of the reward function given popular reward learning data sources.
We also analyse the impact of this partial identifiability for several downstream tasks, such as policy optimisation.
arXiv Detail & Related papers (2022-03-14T20:19:15Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.