Related papers: QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text

URL: http://arxiv.org/abs/2504.09373v1
Date: Sat, 12 Apr 2025 23:46:09 GMT
Title: QUDsim: Quantifying Discourse Similarities in LLM-Generated Text
Authors: Ramya Namuduri, Yating Wu, Anshun Asher Zheng, Manya Wadhwa, Greg Durrett, Junyi Jessy Li,
Abstract summary: We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression.<n>We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents.<n>Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
Score: 70.22275200293964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models become increasingly capable at various writing tasks, their weakness at generating unique and creative content becomes a major liability. Although LLMs have the ability to generate text covering diverse topics, there is an overall sense of repetitiveness across texts that we aim to formalize and quantify via a similarity metric. The familiarity between documents arises from the persistence of underlying discourse structures. However, existing similarity metrics dependent on lexical overlap and syntactic patterns largely capture $\textit{content}$ overlap, thus making them unsuitable for detecting $\textit{structural}$ similarities. We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression. We then use this framework to build $\textbf{QUDsim}$, a similarity metric that can detect discursive parallels between documents. Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs. Furthermore, LLMs are not only repetitive and structurally uniform, but are also divergent from human authors in the types of structures they use.

Related papers

Revisiting Word Embeddings in the LLM Era [0.2999888908665658]
Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. We compare classical decontextualized and contextualized word embeddings with the same for LLM-induced embeddings. Our results show that LLMs cluster semantically related words more tightly and perform better on analogy tasks in decontextualized settings.
arXiv Detail & Related papers (2025-02-26T22:45:08Z)
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor. We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z)
Revisiting Word Embeddings in the LLM Era [5.122866382023337]
Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. We compare classical decontextualized and contextualized word embeddings with the same for LLM-induced embeddings. Our results show that LLMs cluster semantically related words more tightly and perform better on analogy tasks in decontextualized settings.
arXiv Detail & Related papers (2024-02-16T21:47:30Z)
Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs [19.073560504913356]
The line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans.
arXiv Detail & Related papers (2024-02-16T11:20:30Z)
An Investigation of LLMs' Inefficacy in Understanding Converse Relations [30.94718664430869]
We introduce a new benchmark ConvRe focusing on converse relations, which contains 17 relations and 1240 triples extracted from knowledge graph completion datasets. Our ConvRE features two tasks, Re2Text and Text2Re, which are formulated as multi-choice question answering to evaluate LLMs' ability to determine the matching between relations and associated text.
arXiv Detail & Related papers (2023-10-08T13:45:05Z)
DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z)
Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content. We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z)
Measuring Reliability of Large Language Models through Semantic Consistency [3.4990427823966828]
We develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions.
arXiv Detail & Related papers (2022-11-10T20:21:07Z)
FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric [48.66580267438049]
We present FastKASSIM, a metric for utterance- and document-level syntactic similarity. It pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. It runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus.
arXiv Detail & Related papers (2022-03-15T22:33:26Z)
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.