Related papers: Regression with Multi-Expert Deferral

Regression with Multi-Expert Deferral

URL: http://arxiv.org/abs/2403.19494v1
Date: Thu, 28 Mar 2024 15:26:38 GMT
Title: Regression with Multi-Expert Deferral
Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong,
Abstract summary: Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. We present a novel framework of regression with deferral, which involves deferring the prediction to multiple experts. We introduce new surrogate loss functions for both scenarios and prove that they are supported by $H$-consistency bounds.
Score: 30.389055604165222
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which involves deferring the prediction to multiple experts. We present a comprehensive analysis for both the single-stage scenario, where there is simultaneous learning of predictor and deferral functions, and the two-stage scenario, which involves a pre-trained predictor with a learned deferral function. We introduce new surrogate loss functions for both scenarios and prove that they are supported by $H$-consistency bounds. These bounds provide consistency guarantees that are stronger than Bayes consistency, as they are non-asymptotic and hypothesis set-specific. Our framework is versatile, applying to multiple experts, accommodating any bounded regression losses, addressing both instance-dependent and label-dependent costs, and supporting both single-stage and two-stage methods. A by-product is that our single-stage formulation includes the recent regression with abstention framework (Cheng et al., 2023) as a special case, where only a single expert, the squared loss and a label-independent cost are considered. Minimizing our proposed loss functions directly leads to novel algorithms for regression with deferral. We report the results of extensive experiments showing the effectiveness of our proposed algorithms.

Related papers

Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z)
Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral [20.76255397215973]
Large language models (LLMs) have achieved remarkable performance but face critical challenges: hallucinations and high inference costs.<n>Leveraging multiple experts offers a solution: deferring uncertain inputs to more capable experts improves reliability.<n>This thesis presents a comprehensive study of this problem and the related problem of learning with abstention, supported by strong consistency guarantees.
arXiv Detail & Related papers (2025-12-28T11:33:39Z)
Rethinking Consistent Multi-Label Classification under Inexact Supervision [60.79309683889278]
In partial multi-label learning, each instance is annotated with a candidate label set, among which only some labels are relevant.<n>In complementary multi-label learning, each instance is annotated with complementary labels indicating the classes to which the instance does not belong.
arXiv Detail & Related papers (2025-10-05T08:30:32Z)
Learning from Similarity-Confidence and Confidence-Difference [0.24578723416255752]
We propose a novel Weakly Supervised Learning (WSL) framework that leverages complementary weak supervision signals from multiple perspectives.<n>Specifically, we introduce SconfConfDiff Classification, a method that integrates two distinct forms of weaklabels.<n>We prove that both estimators achieve optimal convergence rates with respect to estimation error bounds.
arXiv Detail & Related papers (2025-08-07T07:42:59Z)
Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer [30.389055604165222]
This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees.<n>We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage and two-stage learning scenarios.<n>We derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario.
arXiv Detail & Related papers (2025-06-25T17:48:58Z)
Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models. We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z)
Probably Approximately Precision and Recall Learning [60.00180898830079]
A key challenge in machine learning is the prevalence of one-sided feedback.<n>We introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels.<n>We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning [3.4289478404209826]
We introduce a novel Two-Stage Learning-to-Defer framework for multi-task learning that jointly addresses classification and regression tasks. We validate our framework on two challenging tasks: object detection, where classification and regression are tightly coupled, and electronic health record analysis.
arXiv Detail & Related papers (2024-10-21T07:44:57Z)
Spectral Representation for Causal Estimation with Hidden Confounders [33.148766692274215]
We address the problem of causal effect estimation where hidden confounders are present. Our approach uses a singular value decomposition of a conditional expectation operator, followed by a saddle-point optimization problem.
arXiv Detail & Related papers (2024-07-15T05:39:56Z)
Principled Approaches for Learning to Defer with Multiple Experts [30.389055604165222]
We introduce a new family of surrogate losses specifically tailored for the multiple-expert setting. We prove that these surrogate losses benefit from strong $H$-consistency bounds.
arXiv Detail & Related papers (2023-10-23T10:19:09Z)
Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms [30.389055604165222]
We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees.
arXiv Detail & Related papers (2023-10-23T10:16:27Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles [0.966840768820136]
We study the statistical properties of learning to defer (L2D) to multiple experts. We address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts.
arXiv Detail & Related papers (2022-10-30T21:27:29Z)
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity. We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator. We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors. Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.