Data-Driven Offline Decision-Making via Invariant Representation
Learning
- URL: http://arxiv.org/abs/2211.11349v1
- Date: Mon, 21 Nov 2022 11:01:37 GMT
- Title: Data-Driven Offline Decision-Making via Invariant Representation
Learning
- Authors: Han Qi, Yi Su, Aviral Kumar, Sergey Levine
- Abstract summary: offline data-driven decision-making involves synthesizing optimized decisions with no active interaction.
A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good.
In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
- Score: 97.49309949598505
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The goal in offline data-driven decision-making is synthesize decisions that
optimize a black-box utility function, using a previously-collected static
dataset, with no active interaction. These problems appear in many forms:
offline reinforcement learning (RL), where we must produce actions that
optimize the long-term reward, bandits from logged data, where the goal is to
determine the correct arm, and offline model-based optimization (MBO) problems,
where we must find the optimal design provided access to only a static dataset.
A key challenge in all these settings is distributional shift: when we optimize
with respect to the input into a model trained from offline data, it is easy to
produce an out-of-distribution (OOD) input that appears erroneously good. In
contrast to prior approaches that utilize pessimism or conservatism to tackle
this problem, in this paper, we formulate offline data-driven decision-making
as domain adaptation, where the goal is to make accurate predictions for the
value of optimized decisions ("target domain"), when training only on the
dataset ("source domain"). This perspective leads to invariant objective models
(IOM), our approach for addressing distributional shift by enforcing invariance
between the learned representations of the training dataset and optimized
decisions. In IOM, if the optimized decisions are too different from the
training dataset, the representation will be forced to lose much of the
information that distinguishes good designs from bad ones, making all choices
seem mediocre. Critically, when the optimizer is aware of this representational
tradeoff, it should choose not to stray too far from the training distribution,
leading to a natural trade-off between distributional shift and learning
performance.
Related papers
- Hindsight Preference Learning for Offline Preference-based Reinforcement Learning [22.870967604847458]
Offline preference-based reinforcement learning (RL) focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset.
We propose to model human preferences using rewards conditioned on future outcomes of the trajectory segments.
Our proposed method, Hindsight Preference Learning (HPL), can facilitate credit assignment by taking full advantage of vast trajectory data available in massive unlabeled datasets.
arXiv Detail & Related papers (2024-07-05T12:05:37Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization [16.57676001669012]
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance.
Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore expensive.
We develop a general bias correction approach that directly approximates the first-order bias and does not require solving any additional optimization problems.
arXiv Detail & Related papers (2023-06-16T07:07:58Z) - Building Resilience to Out-of-Distribution Visual Data via Input
Optimization and Model Finetuning [13.804184845195296]
We propose a preprocessing model that learns to optimise input data for a specific target vision model.
We investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles.
We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model.
arXiv Detail & Related papers (2022-11-29T14:06:35Z) - Careful! Training Relevance is Real [0.7742297876120561]
We propose constraints designed to enforce training relevance.
We show through a collection of experimental results that adding the suggested constraints significantly improves the quality of solutions.
arXiv Detail & Related papers (2022-01-12T11:54:31Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - Model Inversion Networks for Model-Based Optimization [110.24531801773392]
We propose model inversion networks (MINs), which learn an inverse mapping from scores to inputs.
MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems.
We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
arXiv Detail & Related papers (2019-12-31T18:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.