Learning Decisions Offline from Censored Observations with   ε-insensitive Operational Costs
        - URL: http://arxiv.org/abs/2408.07305v1
- Date: Wed, 14 Aug 2024 05:44:56 GMT
- Title: Learning Decisions Offline from Censored Observations with   ε-insensitive Operational Costs
- Authors: Minxia Chen, Ke Fu, Teng Huang, Miao Bai, 
- Abstract summary: We design and leverage epsilon-insensitive operational costs to deal with the unobserved censoring in an offline data-driven fashion.
We train two representative ML models, including linear regression (LR) models and neural networks (NNs)
The theoretical results reveal the stability and learnability of LR-epsilonNVC, LR-epsilonNVC-R and NN-epsilonNVC.
- Score: 1.7249361224827533
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   Many important managerial decisions are made based on censored observations. Making decisions without adequately handling the censoring leads to inferior outcomes. We investigate the data-driven decision-making problem with an offline dataset containing the feature data and the censored historical data of the variable of interest without the censoring indicators. Without assuming the underlying distribution, we design and leverage {\epsilon}-insensitive operational costs to deal with the unobserved censoring in an offline data-driven fashion. We demonstrate the customization of the {\epsilon}-insensitive operational costs for a newsvendor problem and use such costs to train two representative ML models, including linear regression (LR) models and neural networks (NNs). We derive tight generalization bounds for the custom LR model without regularization (LR-{\epsilon}NVC) and with regularization (LR-{\epsilon}NVC-R), and a high-probability generalization bound for the custom NN (NN-{\epsilon}NVC) trained by stochastic gradient descent. The theoretical results reveal the stability and learnability of LR-{\epsilon}NVC, LR-{\epsilon}NVC-R and NN-{\epsilon}NVC. We conduct extensive numerical experiments to compare LR-{\epsilon}NVC-R and NN-{\epsilon}NVC with two existing approaches, estimate-as-solution (EAS) and integrated estimation and optimization (IEO). The results show that LR-{\epsilon}NVC-R and NN-{\epsilon}NVC outperform both EAS and IEO, with maximum cost savings up to 14.40% and 12.21% compared to the lowest cost generated by the two existing approaches. In addition, LR-{\epsilon}NVC-R's and NN-{\epsilon}NVC's order quantities are statistically significantly closer to the optimal solutions should the underlying distribution be known. 
 
      
        Related papers
        - Double Machine Learning for Conditional Moment Restrictions: IV   Regression, Proximal Causal Learning and Beyond [16.842233444365764]
 Conditional moment restrictions (CMRs) are a key problem considered in statistics, causal inference, and econometrics.<n>Most CMR estimators use a two-stage approach, where the first-stage estimation is directly plugged into the second stage to estimate the function of interest.<n>We propose DML-CMR, a two-stage CMR estimator that provides an unbiased estimate with fast convergence rate guarantees.
 arXiv  Detail & Related papers  (2025-06-17T20:00:34Z)
- Learning Interpretable Differentiable Logic Networks for Tabular   Regression [3.8064485653035987]
 Differentiable Logic Networks (DLNs) offer interpretable reasoning and substantially lower inference cost.<n>We extend the DLN framework to supervised regression. Specifically, we redesign the final output layer to support continuous targets and unify the original two-phase training procedure into a single differentiable stage.<n>Our results show that DLNs are a viable, cost-effective alternative for regression tasks, especially where model transparency and computational efficiency is important.
 arXiv  Detail & Related papers  (2025-05-29T16:24:18Z)
- Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
 We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.
We show that the widely used beam search method suffers from unacceptable over-optimism.
We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
 arXiv  Detail & Related papers  (2025-04-10T07:50:03Z)
- Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues [65.41946981594567]
 Linear Recurrent Neural Networks (LRNNs) have emerged as efficient alternatives to Transformers in large language modeling.
LRNNs struggle to perform state-tracking which may impair performance in tasks such as code evaluation or tracking a chess game.
Our work enhances the expressivity of modern LRNNs, broadening their applicability without changing the cost of training or inference.
 arXiv  Detail & Related papers  (2024-11-19T14:35:38Z)
- Making Large Language Models Better Planners with Reasoning-Decision   Alignment [70.5381163219608]
 We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
 arXiv  Detail & Related papers  (2024-08-25T16:43:47Z)
- Contextual Linear Optimization with Bandit Feedback [35.692428244561626]
 Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients.
We study a class of offline learning algorithms for CLO with bandit feedback.
We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate.
 arXiv  Detail & Related papers  (2024-05-26T13:27:27Z)
- Neural Network Approximation for Pessimistic Offline Reinforcement
  Learning [17.756108291816908]
 We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
 arXiv  Detail & Related papers  (2023-12-19T05:17:27Z)
- Provably Efficient Neural Offline Reinforcement Learning via Perturbed
  Rewards [33.88533898709351]
 VIPeR amalgamates the randomized value function idea with the pessimism principle.
It implicitly obtains pessimism by simply perturbing the offline data multiple times.
It is both provably and computationally efficient in general Markov decision processes (MDPs) with neural network function approximation.
 arXiv  Detail & Related papers  (2023-02-24T17:52:12Z)
- Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets [57.06026574261203]
 We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
 Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
 arXiv  Detail & Related papers  (2022-10-25T14:45:15Z)
- Rethinking Cost-sensitive Classification in Deep Learning via
  Adversarial Data Augmentation [4.479834103607382]
 Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost.
This paper proposes a cost-sensitive adversarial data augmentation framework to make over- parameterized models cost-sensitive.
Our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
 arXiv  Detail & Related papers  (2022-08-24T19:00:30Z)
- Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
  Optimal Sample Complexity [51.476337785345436]
 We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
 arXiv  Detail & Related papers  (2022-02-28T15:39:36Z)
- Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning [59.02006924867438]
 Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions.
Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting.
We propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets.
 arXiv  Detail & Related papers  (2022-02-19T20:00:44Z)
- Solving Multistage Stochastic Linear Programming via Regularized Linear
  Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
 We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
 arXiv  Detail & Related papers  (2021-10-07T02:36:14Z)
- Dimensionality reduction, regularization, and generalization in
  overparameterized regressions [8.615625517708324]
 We show that PCA-OLS, also known as principal component regression, can be avoided with a dimensionality reduction.
We show that dimensionality reduction improves robustness while OLS is arbitrarily susceptible to adversarial attacks.
We find that methods in which the projection depends on the training data can outperform methods where the projections are chosen independently of the training data.
 arXiv  Detail & Related papers  (2020-11-23T15:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.