Related papers: Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

URL: http://arxiv.org/abs/2409.11726v1
Date: Wed, 18 Sep 2024 06:21:44 GMT
Title: Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
Authors: Wenyuan Zhang, Jiawei Sheng, Shuaiyi Nie, Zefeng Zhang, Xinghua Zhang, Yongquan He, Tingwen Liu,
Abstract summary: We propose a probing dataset to evaluate LLMs' ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to effectively detect these two types of errors. We propose an agent-based reasoning method, Self-Recollection and Self-Doubt, to further explore the potential for improving error detection capabilities.
Score: 14.950721395944388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) role-playing has gained widespread attention, where the authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs' ability to detect characters' known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose a probing dataset to evaluate LLMs' ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to effectively detect these two types of errors, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S2RD), to further explore the potential for improving error detection capabilities. Experiments show that our method effectively improves the LLMs' ability to detect error character knowledge, but it remains an issue that requires ongoing attention.

Related papers

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs [0.0]
Self-correction is an important capability for large language models (LLMs)<n>While LLMs can identify error in user input, they exhibit a systematic 'Self-Correction Blind Spot'<n>Testing 14 models, we find an average 64.5% blind spot rate.<n>Remarkably, simply appending "Wait" reduces blind spots by 89.3%, suggesting that the capability exists but requires activation.
arXiv Detail & Related papers (2025-07-03T16:41:30Z)
How does Misinformation Affect Large Language Model Behaviors and Preferences? [37.06385727015972]
Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks.<n>We present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation.<n> Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations.
arXiv Detail & Related papers (2025-05-27T17:57:44Z)
Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs [61.12688072239607]
This work formally defines self-consistent errors and evaluates mainstream detection methods on them.<n>All four types of detection methshods significantly struggle to detect self-consistent errors.<n>Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective cross-model probe method.
arXiv Detail & Related papers (2025-05-23T09:18:56Z)
KSOD: Knowledge Supplement for LLMs On Demand [4.4997032928974985]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet still produce errors in domain-specific tasks. We propose KSOD, a novel framework that empowers LLMs to improve their capabilities with knowledge-based supervised fine-tuning. Our findings shed light on the potential of improving the capabilities of LLMs with knowledge-based SFT.
arXiv Detail & Related papers (2025-03-10T17:17:41Z)
Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task [71.61879949813998]
In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding.
arXiv Detail & Related papers (2025-02-11T02:31:09Z)
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks. We show that even the most prominent LLMs exhibit these error patterns in their outputs. Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse. Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration. To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations [46.351064535592336]
Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures. Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs. We show that the internal representations of LLMs encode much more information about truthfulness than previously recognized.
arXiv Detail & Related papers (2024-10-03T17:31:31Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations. This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators. The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z)
Evaluating LLMs at Detecting Errors in LLM Responses [30.645694514606507]
This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs. We use ReaLMistake to evaluate error detectors based on 12 Large Language Models.
arXiv Detail & Related papers (2024-04-04T17:19:47Z)
Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction [62.409807640887834]
Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. We rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
arXiv Detail & Related papers (2024-02-18T01:40:34Z)
LLMs cannot find reasoning errors, but can correct them given the error location [0.9017736137562115]
Poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake. We benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task. We show that it is possible to obtain mistake location information without ground truth labels or in-domain training data.
arXiv Detail & Related papers (2023-11-14T20:12:38Z)
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method [36.24876571343749]
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. Recent literature reveals that LLMs generate nonfactual responses intermittently. We propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results.
arXiv Detail & Related papers (2023-10-27T06:22:14Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.