Generative AI & Fictionality: How Novels Power Large Language Models
- URL: http://arxiv.org/abs/2603.01220v1
- Date: Sun, 01 Mar 2026 18:34:02 GMT
- Title: Generative AI & Fictionality: How Novels Power Large Language Models
- Authors: Edwin Roland, Richard Jean So,
- Abstract summary: We study how novels shape the outputs of generative AI.<n>We find that novels leverage familiar attributes and affordances of fiction, while also fomenting new qualities and forms of social response.<n>We argue that if contemporary culture is increasingly shaped by generative AI and machine learning, any analysis of today's various modes of cultural production must account for a relatively novel dimension: computational training data.
- Score: 1.4847369589597454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models, like the one in ChatGPT, are powered by their training data. The models are simply next-word predictors, based on patterns learned from vast amounts of pre-existing text. Since the first generation of GPT, it is striking that the most popular datasets have included substantial collections of novels. For the engineers and research scientists who build these models, there is a common belief that the language in fiction is rich enough to cover all manner of social and communicative phenomena, yet the belief has gone mostly unexamined. How does fiction shape the outputs of generative AI? Specifically, what are novels' effects relative to other forms of text, such as newspapers, Reddit, and Wikipedia? Since the 1970s, literature scholars such as Catherine Gallagher and James Phelan have developed robust and insightful accounts of how fiction operates as a form of discourse and language. Through our study of an influential open-source model (BERT), we find that LLMs leverage familiar attributes and affordances of fiction, while also fomenting new qualities and forms of social response. We argue that if contemporary culture is increasingly shaped by generative AI and machine learning, any analysis of today's various modes of cultural production must account for a relatively novel dimension: computational training data.
Related papers
- The Art of Generative Narrativity [0.0]
generative AI leads to experiments with non-verbal forms that have the potential to incite narratives through the audience's experience.<n>In five central sections, we discuss interrelated exemplars whose conceptual frameworks anticipate or underscore the issues of contemporary linguistic automation.<n>In closing sections, we summarize the expressive features of these exemplars and underline their value for critically assessing generative AI's cultural influence and fallouts.
arXiv Detail & Related papers (2026-03-01T12:58:24Z) - Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction [0.0]
Training a Large Language Model on writers' output to generate "sham books" in a particular style seems to constitute a new form of plagiarism.<n>In this study, we trained Machine Learning classifier models to distinguish short samples of human-written from machine-generated creative fiction.
arXiv Detail & Related papers (2024-12-15T12:46:57Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Experimental Narratives: A Comparison of Human Crowdsourced Storytelling and AI Storytelling [0.0]
The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4.
Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human.
The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more progressive in terms of gender roles and sexuality than those written by humans.
arXiv Detail & Related papers (2023-10-19T16:54:38Z) - Large Language Models for Scientific Synthesis, Inference and
Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation.
We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature.
This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - The Next Chapter: A Study of Large Language Models in Storytelling [51.338324023617034]
The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks.
This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models.
The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
arXiv Detail & Related papers (2023-01-24T02:44:02Z) - ChatGPT is not all you need. A State of the Art Review of large
Generative AI models [0.0]
This work consists on an attempt to describe in a concise way the main models that are affected by generative AI and to provide a taxonomy of the main generative models published recently.
arXiv Detail & Related papers (2023-01-11T15:48:36Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Estimating the Personality of White-Box Language Models [0.589889361990138]
Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere.
Existing research shows that these models can and do capture human biases.
Many of these biases, especially those that could potentially cause harm, are being well-investigated.
However, studies that infer and change human personality traits inherited by these models have been scarce or non-existent.
arXiv Detail & Related papers (2022-04-25T23:53:53Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.