Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models
- URL: http://arxiv.org/abs/2504.09389v1
- Date: Sun, 13 Apr 2025 00:48:58 GMT
- Title: Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models
- Authors: Vishakh Padmakumar, Chen Yueh-Han, Jane Pan, Valerie Chen, He He,
- Abstract summary: Large language models (LLMs) are increasingly used for ideation and scientific discovery.<n>Prior work evaluates novelty as the originality with respect to training data, but original outputs can be low quality.<n>We propose a new novelty metric for LLM generations that balances originality and quality.
- Score: 19.700493685081604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) are increasingly used for ideation and scientific discovery, it is important to evaluate their ability to generate novel output. Prior work evaluates novelty as the originality with respect to training data, but original outputs can be low quality. In contrast, non-expert judges may favor high-quality but memorized outputs, limiting the reliability of human preference as a metric. We propose a new novelty metric for LLM generations that balances originality and quality -- the harmonic mean of the fraction of \ngrams unseen during training and a task-specific quality score. We evaluate the novelty of generations from two families of open-data models (OLMo and Pythia) on three creative tasks: story completion, poetry writing, and creative tool use. We find that LLM generated text is less novel than human written text. To elicit more novel outputs, we experiment with various inference-time methods, which reveals a trade-off between originality and quality. While these methods can boost novelty, they do so by increasing originality at the expense of quality. In contrast, increasing model size or applying post-training reliably shifts the Pareto frontier, highlighting that starting with a stronger base model is a more effective way to improve novelty.
Related papers
- SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation [55.61004653386632]
Large Language Models (LLMs) often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context.
This paper introduces a novel self-supervised method for generating a training set of unfaithful samples.
We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones.
arXiv Detail & Related papers (2025-02-19T12:31:58Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - The Next Chapter: A Study of Large Language Models in Storytelling [51.338324023617034]
The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks.
This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models.
The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
arXiv Detail & Related papers (2023-01-24T02:44:02Z) - GENIUS: Sketch-based Language Model Pre-training via Extreme and
Selective Masking for Text Generation and Augmentation [76.7772833556714]
We introduce GENIUS: a conditional text generation model using sketches as input.
GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective.
We show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-11-18T16:39:45Z) - MOCHA: A Multi-Task Training Approach for Coherent Text Generation from
Cognitive Perspective [22.69509556890676]
We propose a novel multi-task training strategy for coherent text generation grounded on the cognitive theory of writing.
We extensively evaluate our model on three open-ended generation tasks including story generation, news article writing and argument generation.
arXiv Detail & Related papers (2022-10-26T11:55:41Z) - Text Generation by Learning from Demonstrations [17.549815256968877]
Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.
We propose GOLD: an easy-to-optimize algorithm that learns from expert demonstrations by importance weighting.
According to both automatic and human evaluation, models trained by GOLD outperform those trained by MLE and policy gradient.
arXiv Detail & Related papers (2020-09-16T17:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.