Learning and Decision-Making with Data: Optimal Formulations and Phase
Transitions
- URL: http://arxiv.org/abs/2109.06911v3
- Date: Mon, 11 Mar 2024 21:28:38 GMT
- Title: Learning and Decision-Making with Data: Optimal Formulations and Phase
Transitions
- Authors: Amine Bennouna and Bart P.G. Van Parys
- Abstract summary: We study the problem of designing optimal learning and decision-making formulations when only historical data is available.
We show the existence of three distinct out-of-sample performance regimes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of designing optimal learning and decision-making
formulations when only historical data is available. Prior work typically
commits to a particular class of data-driven formulation and subsequently tries
to establish out-of-sample performance guarantees. We take here the opposite
approach. We define first a sensible yard stick with which to measure the
quality of any data-driven formulation and subsequently seek to find an optimal
such formulation. Informally, any data-driven formulation can be seen to
balance a measure of proximity of the estimated cost to the actual cost while
guaranteeing a level of out-of-sample performance. Given an acceptable level of
out-of-sample performance, we construct explicitly a data-driven formulation
that is uniformly closer to the true cost than any other formulation enjoying
the same out-of-sample performance. We show the existence of three distinct
out-of-sample performance regimes (a superexponential regime, an exponential
regime and a subexponential regime) between which the nature of the optimal
data-driven formulation experiences a phase transition. The optimal data-driven
formulations can be interpreted as a classically robust formulation in the
superexponential regime, an entropic distributionally robust formulation in the
exponential regime and finally a variance penalized formulation in the
subexponential regime. This final observation unveils a surprising connection
between these three, at first glance seemingly unrelated, data-driven
formulations which until now remained hidden.
Related papers
- Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization [0.0]
We present an integrated learning and optimization procedure that yields the best approximation of an unknown situation.
Numerical results conducted with inventory problems from the literature as well as a bike-sharing problem with real data demonstrate that the proposed approach performs well.
arXiv Detail & Related papers (2024-11-05T21:54:50Z) - Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z) - DF2: Distribution-Free Decision-Focused Learning [53.2476224456902]
Decision-focused learning (DFL) has recently emerged as a powerful approach for predictthen-optimize problems.
Existing end-to-end DFL methods are hindered by three significant bottlenecks: model error, sample average approximation error, and distribution-based parameterization of the expected objective.
We present DF2 -- the first textit-free decision-focused learning method explicitly designed to address these three bottlenecks.
arXiv Detail & Related papers (2023-08-11T00:44:46Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Data-Driven Sample Average Approximation with Covariate Information [0.0]
We study optimization for data-driven decision-making when we have observations of the uncertain parameters within the optimization model together with concurrent observations of coparametrics.
We investigate three data-driven frameworks that integrate a machine learning prediction model within a programming sample average approximation (SAA) for approximating the solution to this problem.
arXiv Detail & Related papers (2022-07-27T14:45:04Z) - Holistic Robust Data-Driven Decisions [0.0]
Practical overfitting can typically not be attributed to a single cause but instead is caused by several factors all at once.
We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted.
We argue that although existing data-driven formulations may be robust against one of these three sources in isolation they do not provide holistic protection against all overfitting sources simultaneously.
arXiv Detail & Related papers (2022-07-19T21:28:51Z) - Extension of Dynamic Mode Decomposition for dynamic systems with
incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data.
The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured.
We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z) - Learning Optimal Prescriptive Trees from Observational Data [7.215903549622416]
We propose a method for learning optimal prescriptive trees using mixed-integer optimization (MIO) technology.
Contrary to existing literature, our approach does not require data to be randomized, 2) does not impose stringent assumptions on the learned trees, and 3) has the ability to model domain specific constraints.
arXiv Detail & Related papers (2021-08-31T05:38:36Z) - Adaptive Sequential Design for a Single Time-Series [2.578242050187029]
We learn an optimal, unknown choice of the controlled components of a design in order to optimize the expected outcome.
We adapt the randomization mechanism for future time-point experiments based on the data collected on the individual over time.
arXiv Detail & Related papers (2021-01-29T22:51:45Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.