GNAT: A General Narrative Alignment Tool
- URL: http://arxiv.org/abs/2311.03627v1
- Date: Tue, 7 Nov 2023 00:24:14 GMT
- Title: GNAT: A General Narrative Alignment Tool
- Authors: Tanzir Pial, Steven Skiena
- Abstract summary: We develop a general approach to narrative alignment coupling the Smith-Waterman algorithm from bioinformatics with modern text similarity metrics.
We apply and evaluate our general narrative alignment tool (GNAT) on four distinct problem domains differing greatly in both the relative and absolute length of documents.
- Score: 12.100007440638667
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Algorithmic sequence alignment identifies similar segments shared between
pairs of documents, and is fundamental to many NLP tasks. But it is difficult
to recognize similarities between distant versions of narratives such as
translations and retellings, particularly for summaries and abridgements which
are much shorter than the original novels.
We develop a general approach to narrative alignment coupling the
Smith-Waterman algorithm from bioinformatics with modern text similarity
metrics. We show that the background of alignment scores fits a Gumbel
distribution, enabling us to define rigorous p-values on the significance of
any alignment. We apply and evaluate our general narrative alignment tool
(GNAT) on four distinct problem domains differing greatly in both the relative
and absolute length of documents, namely summary-to-book alignment, translated
book alignment, short story alignment, and plagiarism detection --
demonstrating the power and performance of our methods.
Related papers
- Predicting Text Preference Via Structured Comparative Reasoning [110.49560164568791]
We introduce SC, a prompting approach that predicts text preferences by generating structured intermediate comparisons.
We select consistent comparisons with a pairwise consistency comparator that ensures each aspect's comparisons clearly distinguish differences between texts.
Our comprehensive evaluations across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrate that SC equips LLMs to achieve state-of-the-art performance in text preference prediction.
arXiv Detail & Related papers (2023-11-14T18:51:38Z) - FaNS: a Facet-based Narrative Similarity Metric [6.992767260794627]
This paper proposes a novel narrative similarity metric called Facet-based Narrative Similarity (FaNS)
FaNS is based on the classic 5W1H facets (Who, What, When, Where, Why, and How) which are extracted by leveraging the state-of-the-art Large Language Models (LLMs)
arXiv Detail & Related papers (2023-09-09T15:29:24Z) - Rhetorical Role Labeling of Legal Documents using Transformers and Graph
Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Compression, Transduction, and Creation: A Unified Framework for
Evaluating Natural Language Generation [85.32991360774447]
Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives.
We propose a unifying perspective based on the nature of information change in NLG tasks.
We develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks.
arXiv Detail & Related papers (2021-09-14T01:00:42Z) - Extractive approach for text summarisation using graphs [0.0]
Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach.
We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
arXiv Detail & Related papers (2021-06-21T10:03:34Z) - Topical Change Detection in Documents via Embeddings of Long Sequences [4.13878392637062]
We formulate the task of text segmentation as an independent supervised prediction task.
By fine-tuning on paragraphs of similar sections, we are able to show that learned features encode topic information.
Unlike previous approaches, which mostly operate on sentence-level, we consistently use a broader context.
arXiv Detail & Related papers (2020-12-07T12:09:37Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.