Stability of Weighted Majority Voting under Estimated Weights
- URL: http://arxiv.org/abs/2207.06118v2
- Date: Sun, 30 Jun 2024 14:12:02 GMT
- Title: Stability of Weighted Majority Voting under Estimated Weights
- Authors: Shaojie Bai, Dongxia Wang, Tim Muller, Peng Cheng, Jiming Chen,
- Abstract summary: A (machine learning) algorithm that computes trust is called unbiased when it does not systematically overestimate or underestimate the trustworthiness.
We introduce and analyse two important properties of such unbiased trust values: stability of correctness and stability of optimality.
- Score: 16.804588631149393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weighted Majority Voting (WMV) is a well-known optimal decision rule for collective decision making, given the probability of sources to provide accurate information (trustworthiness). However, in reality, the trustworthiness is not a known quantity to the decision maker - they have to rely on an estimate called trust. A (machine learning) algorithm that computes trust is called unbiased when it has the property that it does not systematically overestimate or underestimate the trustworthiness. To formally analyse the uncertainty to the decision process, we introduce and analyse two important properties of such unbiased trust values: stability of correctness and stability of optimality. Stability of correctness means that the decision accuracy that the decision maker believes they achieved is equal to the actual accuracy. We prove stability of correctness holds. Stability of optimality means that the decisions made based on trust, are equally good as they would have been if they were based on trustworthiness. Stability of optimality does not hold. We analyse the difference between the two, and bounds thereon. We also present an overview of how sensitive decision correctness is to changes in trust and trustworthiness.
Related papers
- Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations [49.84786015324238]
Confidence estimation (CE) indicates how reliable the answers of large language models (LLMs) are, and can impact user trust and decision-making.<n>We present a comprehensive evaluation framework for CE that measures their confidence quality on three new aspects.<n>These include robustness of confidence against prompt perturbations, stability across semantic equivalent answers, and sensitivity to semantically different answers.
arXiv Detail & Related papers (2026-01-12T23:16:50Z) - CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? [55.32645640455462]
This paper studies how natural language critiques can enhance verbalized confidence.<n>We propose Self-Critique, enabling LLMs to critique and optimize their confidence beyond mere accuracy.<n> Experiments show that CritiCal significantly outperforms Self-Critique and other competitive baselines.
arXiv Detail & Related papers (2025-10-28T15:16:06Z) - Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution [20.607071807794195]
Large Language Models (LLMs) are widely used as automated judges, where practical value depends on both accuracy and trustworthy, risk-aware judgments.<n>Existing approaches predominantly focus on accuracy, overlooking the necessity of well-calibrated confidence.<n>We advocate a shift from accuracy-centric evaluation to confidence-driven, risk-aware LLM-as-a-Judge systems.
arXiv Detail & Related papers (2025-08-08T11:11:22Z) - Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z) - Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework.
We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z) - Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level.
We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z) - Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.
We propose methods tailored to the unique properties of perception and decision-making.
We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - U-Trustworthy Models.Reliability, Competence, and Confidence in
Decision-Making [0.21756081703275998]
We present a precise mathematical definition of trustworthiness, termed $mathcalU$-trustworthiness.
Within the context of $mathcalU$-trustworthiness, we prove that properly-ranked models are inherently $mathcalU$-trustworthy.
We advocate for the adoption of the AUC metric as the preferred measure of trustworthiness.
arXiv Detail & Related papers (2024-01-04T04:58:02Z) - Human-Aligned Calibration for AI-Assisted Decision Making [19.767213234234855]
We show that, if the confidence values satisfy a natural alignment property with respect to the decision maker's confidence on her own predictions, there always exists an optimal decision policy.
We show that multicalibration with respect to the decision maker's confidence on her own predictions is a sufficient condition for alignment.
arXiv Detail & Related papers (2023-05-31T18:00:14Z) - Did You Mean...? Confidence-based Trade-offs in Semantic Parsing [52.28988386710333]
We show how a calibrated model can help balance common trade-offs in task-oriented parsing.
We then examine how confidence scores can help optimize the trade-off between usability and safety.
arXiv Detail & Related papers (2023-03-29T17:07:26Z) - Confidence-Calibrated Face and Kinship Verification [8.570969129199467]
We introduce an effective confidence measure that allows verification models to convert a similarity score into a confidence score for any given face pair.
We also propose a confidence-calibrated approach, termed Angular Scaling (ASC), which is easy to implement and can be readily applied to existing verification models.
To the best of our knowledge, our work presents the first comprehensive confidence-calibrated solution for modern face and kinship verification tasks.
arXiv Detail & Related papers (2022-10-25T10:43:46Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Stating Comparison Score Uncertainty and Verification Decision
Confidence Towards Transparent Face Recognition [13.555831336280407]
We propose an approach to estimate the uncertainty of face comparison scores.
Second, we introduce a confidence measure of the system's decision to provide insights into the verification decision.
The suitability of the comparison scores uncertainties and the verification decision confidences have been experimentally proven on three face recognition models.
arXiv Detail & Related papers (2022-10-19T07:43:48Z) - MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator [0.17188280334580192]
We argue that any confidence estimates based upon standard machine learning point prediction algorithms are fundamentally flawed.
We present MACEst, a Model Agnostic Confidence Estimator, which provides reliable and trustworthy confidence estimates.
arXiv Detail & Related papers (2021-09-02T14:34:06Z) - Learning Strategies in Decentralized Matching Markets under Uncertain
Preferences [91.3755431537592]
We study the problem of decision-making in the setting of a scarcity of shared resources when the preferences of agents are unknown a priori.
Our approach is based on the representation of preferences in a reproducing kernel Hilbert space.
We derive optimal strategies that maximize agents' expected payoffs.
arXiv Detail & Related papers (2020-10-29T03:08:22Z) - How Much Can We Really Trust You? Towards Simple, Interpretable Trust
Quantification Metrics for Deep Neural Networks [94.65749466106664]
We conduct a thought experiment and explore two key questions about trust in relation to confidence.
We introduce a suite of metrics for assessing the overall trustworthiness of deep neural networks based on their behaviour when answering a set of questions.
The proposed metrics are by no means perfect, but the hope is to push the conversation towards better metrics.
arXiv Detail & Related papers (2020-09-12T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.