Imitation Learning for Combinatorial Optimisation under Uncertainty
- URL: http://arxiv.org/abs/2601.05383v1
- Date: Thu, 08 Jan 2026 21:16:25 GMT
- Title: Imitation Learning for Combinatorial Optimisation under Uncertainty
- Authors: Prakash Gawas, Antoine Legrain, Louis-Martin Rousseau,
- Abstract summary: This paper introduces a systematic taxonomy of experts for IL optimisation under uncertainty.<n>Experts are classified along three dimensions: (i) their treatment of uncertainty, including myopic, deterministic, full-information, two-stage, and multi-stage formulations; (ii) their level of optimality, distinguishing task-optimal and approximate experts; and (iii) their interaction mode with the learner, ranging from one-shot supervision to iterative, interactive schemes.
- Score: 1.0781866671930855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning (IL) provides a data-driven framework for approximating policies for large-scale combinatorial optimisation problems formulated as sequential decision problems (SDPs), where exact solution methods are computationally intractable. A central but underexplored aspect of IL in this context is the role of the \emph{expert} that generates training demonstrations. Existing studies employ a wide range of expert constructions, yet lack a unifying framework to characterise their modelling assumptions, computational properties, and impact on learning performance. This paper introduces a systematic taxonomy of experts for IL in combinatorial optimisation under uncertainty. Experts are classified along three dimensions: (i) their treatment of uncertainty, including myopic, deterministic, full-information, two-stage stochastic, and multi-stage stochastic formulations; (ii) their level of optimality, distinguishing task-optimal and approximate experts; and (iii) their interaction mode with the learner, ranging from one-shot supervision to iterative, interactive schemes. Building on this taxonomy, we propose a generalised Dataset Aggregation (DAgger) algorithm that supports multiple expert queries, expert aggregation, and flexible interaction strategies. The proposed framework is evaluated on a dynamic physician-to-patient assignment problem with stochastic arrivals and capacity constraints. Computational experiments compare learning outcomes across expert types and interaction regimes. The results show that policies learned from stochastic experts consistently outperform those learned from deterministic or full-information experts, while interactive learning improves solution quality using fewer expert demonstrations. Aggregated deterministic experts provide an effective alternative when stochastic optimisation becomes computationally challenging.
Related papers
- Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z) - Conformal Set-based Human-AI Complementarity with Multiple Experts [1.1510009152620668]
This study focuses on the selection of instance-specific experts from a pool of multiple human experts.<n>We introduce a greedy algorithm that utilizes conformal sets to identify the subset of expert predictions that will be used in classifying an instance.
arXiv Detail & Related papers (2025-08-09T14:17:51Z) - Efficient Imitation under Misspecification [17.706710359787056]
We consider the problem of imitation learning under misspecification.<n>We propose inverse reinforcement learning algorithms that merely perform a computationally efficient local search procedure with strong guarantees in the realizable setting.<n>We prove that in the misspecified setting, it is beneficial to broaden the set of states on which local search is performed to include those reachable by good policies the learner can actually play.
arXiv Detail & Related papers (2025-03-17T13:35:55Z) - Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.<n>Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.<n>We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning Interpretable Deep Disentangled Neural Networks for
Hyperspectral Unmixing [16.02193274044797]
We propose a new interpretable deep learning method for hyperspectral unmixing that accounts for nonlinearity and endmember variability.
The model is learned end-to-end using backpropagation, and trained using a self-supervised strategy.
Experimental results on synthetic and real datasets illustrate the performance of the proposed method.
arXiv Detail & Related papers (2023-10-03T18:21:37Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Holistic Deep Learning [3.718942345103135]
This paper presents a novel holistic deep learning framework that addresses the challenges of vulnerability to input perturbations, overparametrization, and performance instability.
The proposed framework holistically improves accuracy, robustness, sparsity, and stability over standard deep learning models.
arXiv Detail & Related papers (2021-10-29T14:46:32Z) - USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization
Problems [9.015720257837575]
We consider the regression between spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs.
For learning foundations, we present learning-error analysis under the PAC-Bayesian framework.
We obtain highly encouraging experimental results for several classic problems on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-07-15T17:59:08Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.