Related papers: A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores

A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores

URL: http://arxiv.org/abs/2310.03283v1
Date: Thu, 5 Oct 2023 03:20:41 GMT
Title: A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores
Authors: Ke Shen and Mayank Kejriwal
Abstract summary: Large Language Models (LLMs) have achieved impressive milestones in natural language processing (NLP) Despite their impressive performance, the models are known to pose important risks. We define and formalize two distinct types of risk: decision risk and composite risk.
Score: 4.043005183192123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs), such as ChatGPT, have achieved impressive milestones in natural language processing (NLP). Despite their impressive performance, the models are known to pose important risks. As these models are deployed in real-world applications, a systematic understanding of different risks posed by these models on tasks such as natural language inference (NLI), is much needed. In this paper, we define and formalize two distinct types of risk: decision risk and composite risk. We also propose a risk-centric evaluation framework, and four novel metrics, for assessing LLMs on these risks in both in-domain and out-of-domain settings. Finally, we propose a risk-adjusted calibration method called DwD for helping LLMs minimize these risks in an overall NLI architecture. Detailed experiments, using four NLI benchmarks, three baselines and two LLMs, including ChatGPT, show both the practical utility of the evaluation framework, and the efficacy of DwD in reducing decision and composite risk. For instance, when using DwD, an underlying LLM is able to address an extra 20.1% of low-risk inference tasks (but which the LLM erroneously deems high-risk without risk adjustment) and skip a further 19.8% of high-risk tasks, which would have been answered incorrectly.

Related papers

Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning [4.8342038441006805]
In domains such as finance, healthcare, and robotics, managing worst-case scenarios is critical. Distributional Reinforcement Learning (DRL) provides a natural framework to incorporate risk sensitivity into decision-making processes. We present a novel DRL algorithm with convergence guarantees that optimize for a broader class of static Spectral Risk Measures (SRM)
arXiv Detail & Related papers (2025-01-03T20:25:41Z)
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play [0.43512163406552007]
As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs. We propose a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth.
arXiv Detail & Related papers (2024-10-26T15:55:21Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference [3.422309388045878]
Large language models (LLMs) such as ChatGPT are known to pose important risks. misplaced confidence arises from over-confidence or under-confidence, that the models have in their inference. We propose an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks.
arXiv Detail & Related papers (2024-08-04T05:24:32Z)
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models [46.93425758722059]
CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs) We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
arXiv Detail & Related papers (2024-06-07T08:52:24Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs [0.0]
This paper explores risks emanating from downstream uses of large language models (LLMs) We introduce a novel LLM-based risk assessment engine (GUARD-D-LLM) designed to pinpoint and rank threats relevant to specific use cases derived from text-based user inputs. Integrating thirty intelligent agents, this innovative approach identifies bespoke risks, gauges their severity, offers targeted suggestions for mitigation, and facilitates risk-aware development.
arXiv Detail & Related papers (2024-04-02T05:25:17Z)
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models [57.10361282229501]
We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial.
arXiv Detail & Related papers (2024-02-05T16:46:16Z)
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains [15.320563604087246]
High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), their performance in high-risk domains remains unclear.
arXiv Detail & Related papers (2023-11-25T08:58:07Z)
Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z)
Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.