LFED: A Literary Fiction Evaluation Dataset for Large Language Models
- URL: http://arxiv.org/abs/2405.10166v1
- Date: Thu, 16 May 2024 15:02:24 GMT
- Title: LFED: A Literary Fiction Evaluation Dataset for Large Language Models
- Authors: Linhao Yu, Qun Liu, Deyi Xiong,
- Abstract summary: We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
- Score: 58.85989777743013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. Additionally, we conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations. Through a series of experiments with various state-of-the-art LLMs, we demonstrate that these models face considerable challenges in effectively addressing questions related to literary fictions, with ChatGPT reaching only 57.08% under the zero-shot setting. The dataset will be publicly available at https://github.com/tjunlp-lab/LFED.git
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Assessing Language Models' Worldview for Fiction Generation [0.0]
This study investigates the ability of Large Language Models to maintain a state of world essential to generate fiction.
We find that only two models exhibit consistent worldview, while the rest are self-conflicting.
This uniformity across models further suggests a lack of state' necessary for fiction.
arXiv Detail & Related papers (2024-08-15T03:19:41Z) - Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.
We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.
We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z) - One Thousand and One Pairs: A "novel" challenge for long-context language models [56.60667988954638]
NoCha is a dataset of 1,001 pairs of true and false claims about 67 fictional books.
Our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify.
On average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning.
arXiv Detail & Related papers (2024-06-24T02:03:57Z) - The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario [12.852843553759744]
We evaluate recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task.
We use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly and a pterodactyl) to minimize the risk of training data leakage.
evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor.
arXiv Detail & Related papers (2024-06-22T17:01:59Z) - Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens [0.0]
Large Language Models (LLMs) like ChatGPT or Bard have revolutionized information retrieval.
We assess the capabilities of various LLMs in producing reliable, comprehensive, and sufficiently relevant responses about historical facts in French.
arXiv Detail & Related papers (2024-06-21T14:19:57Z) - Harnessing Artificial Intelligence to Combat Online Hate: Exploring the
Challenges and Opportunities of Large Language Models in Hate Speech
Detection [4.653571633477755]
Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis.
This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas.
arXiv Detail & Related papers (2024-03-12T19:12:28Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.