Expert-Agnostic Learning to Defer
- URL: http://arxiv.org/abs/2502.10533v2
- Date: Sat, 24 May 2025 17:50:44 GMT
- Title: Expert-Agnostic Learning to Defer
- Authors: Joshua Strong, Pramit Saha, Yasin Ibrahim, Cheng Ouyang, Alison Noble,
- Abstract summary: We introduce Expert-Agnostic Learning to Defer (EA-L2D), a novel L2D framework that employs a Bayesian approach to model expert behaviour.<n>EA-L2D significantly outperforms prior methods on unseen experts, achieving up to a 28% relative improvement.
- Score: 4.171294900540735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to Defer (L2D) trains autonomous systems to handle straightforward cases while deferring uncertain ones to human experts. Recent advancements in this field have introduced methods that offer flexibility to unseen experts at test time. However, we find these approaches struggle to generalise to experts with behaviours not seen during training, require extensive human annotation, and lack mechanisms for incorporating prior knowledge of expert capabilities. To address these challenges, we introduce Expert-Agnostic Learning to Defer (EA-L2D), a novel L2D framework that employs a Bayesian approach to model expert behaviour in an \textit{expert-agnostic} fashion. Across benchmark medical imaging datasets (HAM10000, Blood Cells, Retinal OCT, and Liver Tumours), EA-L2D significantly outperforms prior methods on unseen experts, achieving up to a 28\% relative improvement, while also matching or exceeding state-of-the-art performance on seen experts.
Related papers
- Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration [77.36486933055907]
We propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE)<n>ILDE implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance.<n> Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work.
arXiv Detail & Related papers (2025-06-25T10:39:32Z) - Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training [86.70255651945602]
We introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE)<n>RICE aims to improve reasoning performance without additional training or complexs.<n> Empirical evaluations with leading MoE-based LRMs demonstrate noticeable and consistent improvements in reasoning accuracy, cognitive efficiency, and cross-domain generalization.
arXiv Detail & Related papers (2025-05-20T17:59:16Z) - Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models [5.211806751260724]
We propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts.
We also introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts.
arXiv Detail & Related papers (2025-04-16T04:06:15Z) - Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations [48.890534958441016]
We investigate domain specialization and expert redundancy in large-scale MoE models.
We propose a simple yet effective pruning framework, EASY-EP, to identify and retain only the most relevant experts.
Our method can achieve comparable performances and $2.99times$ throughput under the same memory budget with full DeepSeek-R1 with only half the experts.
arXiv Detail & Related papers (2025-04-09T11:34:06Z) - Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations [86.90549830760513]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks.
We propose MoE Experts Compression Suite (MC-Suite) to provide a benchmark for estimating expert importance from diverse perspectives.
We present an experimentally validated conjecture that, during expert dropping, SMoEs' instruction-following capabilities are predominantly hurt.
arXiv Detail & Related papers (2025-04-08T00:49:08Z) - Learning to Defer for Causal Discovery with Imperfect Experts [59.071731337922664]
We propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results.
We evaluate L2D-CD on the canonical T"ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation.
arXiv Detail & Related papers (2025-02-18T18:55:53Z) - Utilising Deep Learning to Elicit Expert Uncertainty [2.9686400658670578]
We show how analysts can adopt a deep learning approach to utilize the method proposed in [14 ] with the actual information experts use.<n>We provide an overview of deep learning models that can effectively model expert decision-making to elicit distributions that capture expert uncertainty.
arXiv Detail & Related papers (2025-01-21T01:36:12Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - Learning to Defer to a Population: A Meta-Learning Approach [4.64209268422757]
We formulate an L2D system that can cope with never-before-seen experts at test-time.
We employ an attention mechanism that is able to look for points in the context set that are similar to a given test point.
In the experiments, we validate our methods on image recognition, traffic sign detection, and skin lesion diagnosis benchmarks.
arXiv Detail & Related papers (2024-03-05T06:10:28Z) - Defining Expertise: Applications to Treatment Effect Estimation [58.7977683502207]
We argue that expertise - particularly the type of expertise the decision-makers of a domain are likely to have - can be informative in designing and selecting methods for treatment effect estimation.
We define two types of expertise, predictive and prognostic, and demonstrate empirically that: (i) the prominent type of expertise in a domain significantly influences the performance of different methods in treatment effect estimation, and (ii) it is possible to predict the type of expertise present in a dataset.
arXiv Detail & Related papers (2024-03-01T17:30:49Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Learning to Defer with Limited Expert Predictions [0.0]
We propose a three-step approach to reduce the number of expert predictions required to train learning to defer algorithms.
Our experiments show that the approach allows the training of various learning to defer algorithms with a minimal number of human expert predictions.
arXiv Detail & Related papers (2023-04-14T09:22:34Z) - Experts in the Loop: Conditional Variable Selection for Accelerating
Post-Silicon Analysis Based on Deep Learning [6.6357750579293935]
Post-silicon validation is one of the most critical processes in semiconductor manufacturing.
This work aims to design a novel conditional variable selection approach while keeping experts in the loop.
arXiv Detail & Related papers (2022-09-30T06:12:12Z) - Decision Rule Elicitation for Domain Adaptation [93.02675868486932]
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels from experts.
In this work, we allow experts to additionally produce decision rules describing their decision-making.
We show that decision rule elicitation improves domain adaptation of the algorithm and helps to propagate expert's knowledge to the AI model.
arXiv Detail & Related papers (2021-02-23T08:07:22Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.