Related papers: Data-Driven Offline Decision-Making via Invariant Representation Learning

Data-Driven Offline Decision-Making via Invariant Representation Learning

URL: http://arxiv.org/abs/2211.11349v1
Date: Mon, 21 Nov 2022 11:01:37 GMT
Title: Data-Driven Offline Decision-Making via Invariant Representation Learning
Authors: Han Qi, Yi Su, Aviral Kumar, Sergey Levine
Abstract summary: offline data-driven decision-making involves synthesizing optimized decisions with no active interaction. A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good. In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
Score: 97.49309949598505
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The goal in offline data-driven decision-making is synthesize decisions that optimize a black-box utility function, using a previously-collected static dataset, with no active interaction. These problems appear in many forms: offline reinforcement learning (RL), where we must produce actions that optimize the long-term reward, bandits from logged data, where the goal is to determine the correct arm, and offline model-based optimization (MBO) problems, where we must find the optimal design provided access to only a static dataset. A key challenge in all these settings is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good. In contrast to prior approaches that utilize pessimism or conservatism to tackle this problem, in this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions ("target domain"), when training only on the dataset ("source domain"). This perspective leads to invariant objective models (IOM), our approach for addressing distributional shift by enforcing invariance between the learned representations of the training dataset and optimized decisions. In IOM, if the optimized decisions are too different from the training dataset, the representation will be forced to lose much of the information that distinguishes good designs from bad ones, making all choices seem mediocre. Critically, when the optimizer is aware of this representational tradeoff, it should choose not to stray too far from the training distribution, leading to a natural trade-off between distributional shift and learning performance.

Related papers

OPO: Making Decision-Focused Data Acquisition Decisions [0.0]
We propose a model for making data acquisition decisions for variables in contextual optimisation problems. We solve the data acquisition problem with well-defined constraints by learning a surrogate linear objective function. We ablate the problem with a number of training modalities and demonstrate that the differentiable optimisation approach outperforms random search strategies.
arXiv Detail & Related papers (2025-04-21T12:41:35Z)
Unifying and Optimizing Data Values for Selection via Sequential-Decision-Making [5.755427480127593]
We show that data values applied for selection can be reformulated as a sequential-decision-making problem. We propose an efficient approximation scheme using learned bipartite graphs as surrogate utility models.
arXiv Detail & Related papers (2025-02-06T23:03:10Z)
Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training. We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning [22.870967604847458]
Offline preference-based reinforcement learning (RL) focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset. We propose to model human preferences using rewards conditioned on future outcomes of the trajectory segments. Our proposed method, Hindsight Preference Learning (HPL), can facilitate credit assignment by taking full advantage of vast trajectory data available in massive unlabeled datasets.
arXiv Detail & Related papers (2024-07-05T12:05:37Z)
DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality. We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization [16.57676001669012]
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance. Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore expensive. We develop a general bias correction approach that directly approximates the first-order bias and does not require solving any additional optimization problems.
arXiv Detail & Related papers (2023-06-16T07:07:58Z)
Building Resilience to Out-of-Distribution Visual Data via Input Optimization and Model Finetuning [13.804184845195296]
We propose a preprocessing model that learns to optimise input data for a specific target vision model. We investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles. We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model.
arXiv Detail & Related papers (2022-11-29T14:06:35Z)
Careful! Training Relevance is Real [0.7742297876120561]
We propose constraints designed to enforce training relevance. We show through a collection of experimental results that adding the suggested constraints significantly improves the quality of solutions.
arXiv Detail & Related papers (2022-01-12T11:54:31Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting. We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)
Model Inversion Networks for Model-Based Optimization [110.24531801773392]
We propose model inversion networks (MINs), which learn an inverse mapping from scores to inputs. MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems. We evaluate MINs on tasks from the Bayesian optimization literature, high-dimensional model-based optimization problems over images and protein designs, and contextual bandit optimization from logged data.
arXiv Detail & Related papers (2019-12-31T18:06:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.