What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience
- URL: http://arxiv.org/abs/2206.06485v1
- Date: Mon, 13 Jun 2022 21:31:06 GMT
- Title: What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience
- Authors: Alexandra Kearney, Anna Koop, Johannes G\"unther, Patrick M. Pilarski
- Abstract summary: computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
- Score: 63.75363908696257
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In computational reinforcement learning, a growing body of work seeks to
construct an agent's perception of the world through predictions of future
sensations; predictions about environment observations are used as additional
input features to enable better goal-directed decision-making. An open
challenge in this line of work is determining from the infinitely many
predictions that the agent could possibly make which predictions might best
support decision-making. This challenge is especially apparent in continual
learning problems where a single stream of experience is available to a
singular agent. As a primary contribution, we introduce a meta-gradient descent
process by which an agent learns 1) what predictions to make, 2) the estimates
for its chosen predictions, and 3) how to use those estimates to generate
policies that maximize future reward -- all during a single ongoing process of
continual learning. In this manuscript we consider predictions expressed as
General Value Functions: temporally extended estimates of the accumulation of a
future signal. We demonstrate that through interaction with the environment an
agent can independently select predictions that resolve partial-observability,
resulting in performance similar to expertly specified GVFs. By learning,
rather than manually specifying these predictions, we enable the agent to
identify useful predictions in a self-supervised manner, taking a step towards
truly autonomous systems.
Related papers
- Performative Prediction on Games and Mechanism Design [69.7933059664256]
We study a collective risk dilemma where agents decide whether to trust predictions based on past accuracy.
As predictions shape collective outcomes, social welfare arises naturally as a metric of concern.
We show how to achieve better trade-offs and use them for mechanism design.
arXiv Detail & Related papers (2024-08-09T16:03:44Z) - GVFs in the Real World: Making Predictions Online for Water Treatment [23.651798878534635]
We investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant.
We first describe this dataset, and highlight challenges with seasonality, nonstationarity, partial observability.
We show the importance of learning in deployment, by comparing a TD agent trained purely offline with no online updating to a TD agent that learns online.
arXiv Detail & Related papers (2023-12-04T04:49:10Z) - Making Decisions under Outcome Performativity [9.962472413291803]
We introduce a new optimality concept -- performative omniprediction.
A performative omnipredictor is a single predictor that simultaneously encodes the optimal decision rule.
We show that efficient performative omnipredictors exist, under a natural restriction of performative prediction.
arXiv Detail & Related papers (2022-10-04T17:04:47Z) - Predicting from Predictions [18.393971232725015]
We study how causal effects of predictions on outcomes can be identified from observational data.
We show that supervised learning that predict from predictions can find transferable functional relationships between features, predictions, and outcomes.
arXiv Detail & Related papers (2022-08-15T16:57:02Z) - Finding Useful Predictions by Meta-gradient Descent to Improve
Decision-making [1.384055225262046]
We focus on predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal.
One challenge is determining from the infinitely many predictions that the agent could possibly make which might support decision-making.
By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner.
arXiv Detail & Related papers (2021-11-18T20:17:07Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Heterogeneous-Agent Trajectory Forecasting Incorporating Class
Uncertainty [54.88405167739227]
We present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities.
We additionally present PUP, a new challenging real-world autonomous driving dataset.
We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty.
arXiv Detail & Related papers (2021-04-26T10:28:34Z) - When Does Uncertainty Matter?: Understanding the Impact of Predictive
Uncertainty in ML Assisted Decision Making [68.19284302320146]
We carry out user studies to assess how people with differing levels of expertise respond to different types of predictive uncertainty.
We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions.
This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
arXiv Detail & Related papers (2020-11-12T02:23:53Z) - Counterfactual Predictions under Runtime Confounding [74.90756694584839]
We study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data.
We propose a doubly-robust procedure for learning counterfactual prediction models in this setting.
arXiv Detail & Related papers (2020-06-30T15:49:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.