Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor
Problem
- URL: http://arxiv.org/abs/2209.05093v1
- Date: Mon, 12 Sep 2022 08:52:26 GMT
- Title: Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor
Problem
- Authors: Breno Serrano, Stefan Minner, Maximilian Schiffer, Thibaut Vidal
- Abstract summary: We study the feature-based news vendor problem, in which a decision-maker has access to historical data.
In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance.
We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers.
- Score: 8.281391209717105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the feature-based newsvendor problem, in which a decision-maker has
access to historical data consisting of demand observations and exogenous
features. In this setting, we investigate feature selection, aiming to derive
sparse, explainable models with improved out-of-sample performance. Up to now,
state-of-the-art methods utilize regularization, which penalizes the number of
selected features or the norm of the solution vector. As an alternative, we
introduce a novel bilevel programming formulation. The upper-level problem
selects a subset of features that minimizes an estimate of the out-of-sample
cost of ordering decisions based on a held-out validation set. The lower-level
problem learns the optimal coefficients of the decision function on a training
set, using only the features selected by the upper-level. We present a mixed
integer linear program reformulation for the bilevel program, which can be
solved to optimality with standard optimization solvers. Our computational
experiments show that the method accurately recovers ground-truth features
already for instances with a sample size of a few hundred observations. In
contrast, regularization-based techniques often fail at feature recovery or
require thousands of observations to obtain similar accuracy. Regarding
out-of-sample generalization, we achieve improved or comparable cost
performance.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Finding Optimal Diverse Feature Sets with Alternative Feature Selection [0.0]
We introduce alternative feature selection and formalize it as an optimization problem.
In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives.
We show that a constant-factor approximation exists under certain conditions and propose corresponding search methods.
arXiv Detail & Related papers (2023-07-21T14:23:41Z) - Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation.
A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization.
Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z) - Low Budget Active Learning via Wasserstein Distance: An Integer
Programming Approach [81.19737119343438]
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label.
We propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool.
Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool.
arXiv Detail & Related papers (2021-06-05T21:25:03Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Outlier Detection Ensemble with Embedded Feature Selection [42.8338013000469]
We propose an outlier detection ensemble framework with embedded feature selection (ODEFS)
For each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation.
We adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection.
arXiv Detail & Related papers (2020-01-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.