Related papers: GNAT: A General Narrative Alignment Tool

GNAT: A General Narrative Alignment Tool

URL: http://arxiv.org/abs/2311.03627v1
Date: Tue, 7 Nov 2023 00:24:14 GMT
Title: GNAT: A General Narrative Alignment Tool
Authors: Tanzir Pial, Steven Skiena
Abstract summary: We develop a general approach to narrative alignment coupling the Smith-Waterman algorithm from bioinformatics with modern text similarity metrics. We apply and evaluate our general narrative alignment tool (GNAT) on four distinct problem domains differing greatly in both the relative and absolute length of documents.
Score: 12.100007440638667
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Algorithmic sequence alignment identifies similar segments shared between pairs of documents, and is fundamental to many NLP tasks. But it is difficult to recognize similarities between distant versions of narratives such as translations and retellings, particularly for summaries and abridgements which are much shorter than the original novels. We develop a general approach to narrative alignment coupling the Smith-Waterman algorithm from bioinformatics with modern text similarity metrics. We show that the background of alignment scores fits a Gumbel distribution, enabling us to define rigorous p-values on the significance of any alignment. We apply and evaluate our general narrative alignment tool (GNAT) on four distinct problem domains differing greatly in both the relative and absolute length of documents, namely summary-to-book alignment, translated book alignment, short story alignment, and plagiarism detection -- demonstrating the power and performance of our methods.

Related papers

Explainable identification of similarities between entities for discovery in large text [0.0]
This study develops an n-gram analysis framework to compare documents automatically and uncover explainable similarities. A scoring formula is applied to assigns each of the n-grams with a weight, where the weight is higher when the n-grams are more frequent in both documents. Visualization tools like word clouds enhance the representation of these patterns, providing clearer insights.
arXiv Detail & Related papers (2025-03-22T01:20:43Z)
Predicting Text Preference Via Structured Comparative Reasoning [110.49560164568791]
We introduce SC, a prompting approach that predicts text preferences by generating structured intermediate comparisons. We select consistent comparisons with a pairwise consistency comparator that ensures each aspect's comparisons clearly distinguish differences between texts. Our comprehensive evaluations across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrate that SC equips LLMs to achieve state-of-the-art performance in text preference prediction.
arXiv Detail & Related papers (2023-11-14T18:51:38Z)
FaNS: a Facet-based Narrative Similarity Metric [6.992767260794627]
This paper proposes a novel narrative similarity metric called Facet-based Narrative Similarity (FaNS) FaNS is based on the classic 5W1H facets (Who, What, When, Where, Why, and How) which are extracted by leveraging the state-of-the-art Large Language Models (LLMs)
arXiv Detail & Related papers (2023-09-09T15:29:24Z)
Rhetorical Role Labeling of Legal Documents using Transformers and Graph Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation [85.32991360774447]
Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives. We propose a unifying perspective based on the nature of information change in NLG tasks. We develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks.
arXiv Detail & Related papers (2021-09-14T01:00:42Z)
Extractive approach for text summarisation using graphs [0.0]
Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach. We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
arXiv Detail & Related papers (2021-06-21T10:03:34Z)
Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task. By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information. Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z)
Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations. Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model. We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer. These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.