A Formalism and Approach for Improving Robustness of Large Language
Models Using Risk-Adjusted Confidence Scores
- URL: http://arxiv.org/abs/2310.03283v1
- Date: Thu, 5 Oct 2023 03:20:41 GMT
- Title: A Formalism and Approach for Improving Robustness of Large Language
Models Using Risk-Adjusted Confidence Scores
- Authors: Ke Shen and Mayank Kejriwal
- Abstract summary: Large Language Models (LLMs) have achieved impressive milestones in natural language processing (NLP)
Despite their impressive performance, the models are known to pose important risks.
We define and formalize two distinct types of risk: decision risk and composite risk.
- Score: 4.043005183192123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), such as ChatGPT, have achieved impressive
milestones in natural language processing (NLP). Despite their impressive
performance, the models are known to pose important risks. As these models are
deployed in real-world applications, a systematic understanding of different
risks posed by these models on tasks such as natural language inference (NLI),
is much needed. In this paper, we define and formalize two distinct types of
risk: decision risk and composite risk. We also propose a risk-centric
evaluation framework, and four novel metrics, for assessing LLMs on these risks
in both in-domain and out-of-domain settings. Finally, we propose a
risk-adjusted calibration method called DwD for helping LLMs minimize these
risks in an overall NLI architecture. Detailed experiments, using four NLI
benchmarks, three baselines and two LLMs, including ChatGPT, show both the
practical utility of the evaluation framework, and the efficacy of DwD in
reducing decision and composite risk. For instance, when using DwD, an
underlying LLM is able to address an extra 20.1% of low-risk inference tasks
(but which the LLM erroneously deems high-risk without risk adjustment) and
skip a further 19.8% of high-risk tasks, which would have been answered
incorrectly.
Related papers
- SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference [3.422309388045878]
Large language models (LLMs) such as ChatGPT are known to pose important risks.
misplaced confidence arises from over-confidence or under-confidence, that the models have in their inference.
We propose an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks.
arXiv Detail & Related papers (2024-08-04T05:24:32Z) - CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models [46.93425758722059]
CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs)
We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe.
The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
arXiv Detail & Related papers (2024-06-07T08:52:24Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs [0.0]
This paper explores risks emanating from downstream uses of large language models (LLMs)
We introduce a novel LLM-based risk assessment engine (GUARD-D-LLM) designed to pinpoint and rank threats relevant to specific use cases derived from text-based user inputs.
Integrating thirty intelligent agents, this innovative approach identifies bespoke risks, gauges their severity, offers targeted suggestions for mitigation, and facilitates risk-aware development.
arXiv Detail & Related papers (2024-04-02T05:25:17Z) - C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models [57.10361282229501]
We propose C-RAG, the first framework to certify generation risks for RAG models.
Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks.
We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial.
arXiv Detail & Related papers (2024-02-05T16:46:16Z) - Walking a Tightrope -- Evaluating Large Language Models in High-Risk
Domains [15.320563604087246]
High-risk domains pose unique challenges that require language models to provide accurate and safe responses.
Despite the great success of large language models (LLMs), their performance in high-risk domains remains unclear.
arXiv Detail & Related papers (2023-11-25T08:58:07Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.