BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence
- URL: http://arxiv.org/abs/2312.16893v1
- Date: Thu, 28 Dec 2023 08:34:17 GMT
- Title: BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence
- Authors: Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang
- Abstract summary: Coherent texts inherently manifest a sequential and cohesive interplay among sentences.
BBScore is a reference-free metric grounded in Brownian bridge theory for assessing text coherence.
- Score: 20.507596002357655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring the coherence of text is a vital aspect of evaluating the quality
of written content. Recent advancements in neural coherence modeling have
demonstrated their efficacy in capturing entity coreference and discourse
relations, thereby enhancing coherence evaluation. However, many existing
methods heavily depend on static embeddings or focus narrowly on nearby
context, constraining their capacity to measure the overarching coherence of
long texts. In this paper, we posit that coherent texts inherently manifest a
sequential and cohesive interplay among sentences, effectively conveying the
central theme, purpose, or standpoint. To explore this abstract relationship,
we introduce the "BBScore," a novel reference-free metric grounded in Brownian
bridge theory for assessing text coherence. Our findings showcase that when
synergized with a simple additional classification component, this metric
attains a performance level comparable to state-of-the-art techniques on
standard artificial discrimination tasks. We also establish in downstream tasks
that this metric effectively differentiates between human-written documents and
text generated by large language models under a specific domain. Furthermore,
we illustrate the efficacy of this approach in detecting written styles
attributed to diverse large language models, underscoring its potential for
generalizability. In summary, we present a novel Brownian bridge coherence
metric capable of measuring both local and global text coherence, while
circumventing the need for end-to-end model training. This flexibility allows
for its application in various downstream tasks.
Related papers
- Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - CoheSentia: A Novel Benchmark of Incremental versus Holistic Assessment
of Coherence in Generated Texts [15.866519123942457]
We introduce sc CoheSentia, a novel benchmark of human-perceived coherence of automatically generated texts.
Our benchmark contains 500 automatically-generated and human-annotated paragraphs, each annotated in both methods.
Our analysis shows that the inter-annotator agreement in the incremental mode is higher than in the holistic alternative.
arXiv Detail & Related papers (2023-10-25T03:21:20Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Multi-Dimensional Evaluation of Text Summarization with In-Context
Learning [79.02280189976562]
In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning.
Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization.
We then analyze the effects of factors such as the selection and number of in-context examples on performance.
arXiv Detail & Related papers (2023-06-01T23:27:49Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts
using Multilayer Networks [10.556468838821338]
We aim to computationally analyze the discourse cohesion in scientific scholarly texts using multilayer network representation.
We design section-level and document-level metrics to assess the extent of lexical cohesion in text.
We present an analytical framework, CHIAA (CHeck It Again, Author), to provide pointers to the author for potential improvements in the manuscript.
arXiv Detail & Related papers (2022-05-16T09:10:41Z) - Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels.
Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.