TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
Models
- URL: http://arxiv.org/abs/2306.11507v1
- Date: Tue, 20 Jun 2023 12:53:39 GMT
- Title: TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
Models
- Authors: Yue Huang and Qihui Zhang and Philip S. Y and Lichao Sun
- Abstract summary: Large Language Models (LLMs) have gained significant attention due to their impressive natural language processing capabilities.
TrustGPT provides a comprehensive evaluation of LLMs in three crucial areas: toxicity, bias, and value-alignment.
This research aims to enhance our understanding of the performance of conversation generation models and promote the development of language models that are more ethical and socially responsible.
- Score: 19.159479032207155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) such as ChatGPT, have gained significant
attention due to their impressive natural language processing capabilities. It
is crucial to prioritize human-centered principles when utilizing these models.
Safeguarding the ethical and moral compliance of LLMs is of utmost importance.
However, individual ethical issues have not been well studied on the latest
LLMs. Therefore, this study aims to address these gaps by introducing a new
benchmark -- TrustGPT. TrustGPT provides a comprehensive evaluation of LLMs in
three crucial areas: toxicity, bias, and value-alignment. Initially, TrustGPT
examines toxicity in language models by employing toxic prompt templates
derived from social norms. It then quantifies the extent of bias in models by
measuring quantifiable toxicity values across different groups. Lastly,
TrustGPT assesses the value of conversation generation models from both active
value-alignment and passive value-alignment tasks. Through the implementation
of TrustGPT, this research aims to enhance our understanding of the performance
of conversation generation models and promote the development of language
models that are more ethical and socially responsible.
Related papers
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness [30.632260870411177]
Large language models (LLMs) have rapidly penetrated into people's work and daily lives over the past few years.
This thesis focuses on the correctness, non-toxicity, and fairness of LLMs from both software testing and natural language processing perspectives.
arXiv Detail & Related papers (2024-08-31T22:21:04Z) - BeHonest: Benchmarking Honesty in Large Language Models [23.192389530727713]
We introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in Large Language Models.
BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses.
Our findings indicate that there is still significant room for improvement in the honesty of LLMs.
arXiv Detail & Related papers (2024-06-19T06:46:59Z) - More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness [24.843692458375436]
This study investigates how models that have been aligned with general-purpose preference data on helpfulness and harmlessness perform across five trustworthiness verticals.
We discover that the improvement in trustworthiness by RLHF is far from guaranteed, and there exists a complex interplay between preference data, alignment algorithms, and specific trustworthiness aspects.
arXiv Detail & Related papers (2024-04-29T17:00:53Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - SaGE: Evaluating Moral Consistency in Large Language Models [15.079905222871071]
We show that even state-of-the-art Large Language Models are morally inconsistent in their generations.
We propose an information-theoretic measure called Semantic Graph Entropy (SaGE) to measure a model's moral consistency.
arXiv Detail & Related papers (2024-02-21T11:23:21Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Are Large Language Models Reliable Judges? A Study on the Factuality
Evaluation Capabilities of LLMs [8.526956860672698]
Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities.
This study investigates the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models.
arXiv Detail & Related papers (2023-11-01T17:42:45Z) - G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment [64.01972723692587]
We present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm to assess the quality of NLG outputs.
We show that G-Eval with GPT-4 as the backbone model achieves a Spearman correlation of 0.514 with human on summarization task, outperforming all previous methods by a large margin.
arXiv Detail & Related papers (2023-03-29T12:46:54Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.