SaGE: Evaluating Moral Consistency in Large Language Models
- URL: http://arxiv.org/abs/2402.13709v2
- Date: Fri, 8 Mar 2024 14:35:30 GMT
- Title: SaGE: Evaluating Moral Consistency in Large Language Models
- Authors: Vamshi Krishna Bonagiri, Sreeram Vennam, Priyanshul Govil, Ponnurangam
Kumaraguru, Manas Gaur
- Abstract summary: We show that even state-of-the-art Large Language Models are morally inconsistent in their generations.
We propose an information-theoretic measure called Semantic Graph Entropy (SaGE) to measure a model's moral consistency.
- Score: 15.079905222871071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent advancements showcasing the impressive capabilities of Large
Language Models (LLMs) in conversational systems, we show that even
state-of-the-art LLMs are morally inconsistent in their generations,
questioning their reliability (and trustworthiness in general). Prior works in
LLM evaluation focus on developing ground-truth data to measure accuracy on
specific tasks. However, for moral scenarios that often lack universally
agreed-upon answers, consistency in model responses becomes crucial for their
reliability. To address this issue, we propose an information-theoretic measure
called Semantic Graph Entropy (SaGE), grounded in the concept of "Rules of
Thumb" (RoTs) to measure a model's moral consistency. RoTs are abstract
principles learned by a model and can help explain their decision-making
strategies effectively. To this extent, we construct the Moral Consistency
Corpus (MCC), containing 50K moral questions, responses to them by LLMs, and
the RoTs that these models followed. Furthermore, to illustrate the
generalizability of SaGE, we use it to investigate LLM consistency on two
popular datasets -- TruthfulQA and HellaSwag. Our results reveal that
task-accuracy and consistency are independent problems, and there is a dire
need to investigate these issues further.
Related papers
- MM-R$^3$: On (In-)Consistency of Multi-modal Large Language Models (MLLMs) [26.475993408532304]
We study the ability of an MLLM model to produce semantically similar or identical responses to semantically similar queries.
We propose the MM-R$3$ benchmark, which analyses the performance in terms of consistency and accuracy in SoTA MLLMs.
Our analysis reveals that consistency does not always align with accuracy, indicating that models with higher accuracy are not necessarily more consistent, and vice versa.
arXiv Detail & Related papers (2024-10-07T06:36:55Z) - FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" [74.7488607599921]
FaithEval is a benchmark to evaluate the faithfulness of large language models (LLMs) in contextual scenarios.
FaithEval comprises 4.9K high-quality problems in total, validated through a rigorous four-stage context construction and validation framework.
arXiv Detail & Related papers (2024-09-30T06:27:53Z) - BeHonest: Benchmarking Honesty in Large Language Models [23.192389530727713]
We introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in Large Language Models.
BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses.
Our findings indicate that there is still significant room for improvement in the honesty of LLMs.
arXiv Detail & Related papers (2024-06-19T06:46:59Z) - MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Measuring Moral Inconsistencies in Large Language Models [16.47371312298185]
A Large Language Model (LLM) is considered consistent if semantically equivalent prompts produce semantically equivalent responses.
We show that even state-of-the-art LLMs are highly inconsistent in their generations, questioning their reliability.
We propose a novel information-theoretic measure called Semantic Graph Entropy (SGE) to measure the consistency of an LLM in moral scenarios.
arXiv Detail & Related papers (2024-01-26T18:05:47Z) - Alignment for Honesty [105.72465407518325]
Recent research has made significant strides in aligning large language models (LLMs) with helpfulness and harmlessness.
In this paper, we argue for the importance of alignment for emphhonesty, ensuring that LLMs proactively refuse to answer questions when they lack knowledge.
We address these challenges by first establishing a precise problem definition and defining honesty'' inspired by the Analects of Confucius.
arXiv Detail & Related papers (2023-12-12T06:10:42Z) - The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust.
It asks necessary questions to decide when an LLM should refine its output.
It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
Models [19.159479032207155]
Large Language Models (LLMs) have gained significant attention due to their impressive natural language processing capabilities.
TrustGPT provides a comprehensive evaluation of LLMs in three crucial areas: toxicity, bias, and value-alignment.
This research aims to enhance our understanding of the performance of conversation generation models and promote the development of language models that are more ethical and socially responsible.
arXiv Detail & Related papers (2023-06-20T12:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.