RELIC: Retrieving Evidence for Literary Claims
- URL: http://arxiv.org/abs/2203.10053v1
- Date: Fri, 18 Mar 2022 16:56:08 GMT
- Title: RELIC: Retrieving Evidence for Literary Claims
- Authors: Katherine Thai, Yapei Chang, Kalpesh Krishna, and Mohit Iyyer
- Abstract summary: We use a large-scale dataset of 78K literary quotations to formulate the novel task of literary evidence retrieval.
We implement a RoBERTa-based dense passage retriever for this task that outperforms existing pretrained information retrieval baselines.
- Score: 29.762552250403544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humanities scholars commonly provide evidence for claims that they make about
a work of literature (e.g., a novel) in the form of quotations from the work.
We collect a large-scale dataset (RELiC) of 78K literary quotations and
surrounding critical analysis and use it to formulate the novel task of
literary evidence retrieval, in which models are given an excerpt of literary
analysis surrounding a masked quotation and asked to retrieve the quoted
passage from the set of all passages in the work. Solving this retrieval task
requires a deep understanding of complex literary and linguistic phenomena,
which proves challenging to methods that overwhelmingly rely on lexical and
semantic similarity matching. We implement a RoBERTa-based dense passage
retriever for this task that outperforms existing pretrained information
retrieval baselines; however, experiments and analysis by human domain experts
indicate that there is substantial room for improvement over our dense
retriever.
Related papers
- CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays [30.728539221991188]
Existing rhetorical datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories.
We propose the Chinese Essay Rhetoric dataset (CERD), consisting of 4 commonly used coarse-grained categories.
CERD is a manually annotated and comprehensive Chinese rhetoric dataset with five interrelated sub-tasks.
arXiv Detail & Related papers (2024-09-29T12:47:25Z) - Says Who? Effective Zero-Shot Annotation of Focalization [0.0]
Focalization, the perspective through which narrative is presented, is encoded via a wide range of lexico-grammatical features.
We provide experiments to test how well contemporary Large Language Models (LLMs) perform when annotating literary texts for focalization mode.
arXiv Detail & Related papers (2024-09-17T17:50:15Z) - Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization [7.234196390284036]
This article summarizes the research on Transformer-based abstractive summarization for English dialogues.
We cover the main challenges present in dialog summarization (i.e., language, structure, comprehension, speaker, salience, and factuality)
We find that while some challenges, like language, have seen considerable progress, others, such as comprehension, factuality, and salience, remain difficult and hold significant research opportunities.
arXiv Detail & Related papers (2024-06-11T17:30:22Z) - ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary [30.409552944905915]
ChatCite is an LLM agent with human workflow guidance for comparative literature summary.
The ChatCite agent outperformed other models in various dimensions in the experiments.
The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
arXiv Detail & Related papers (2024-03-05T01:13:56Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z) - Sensing Ambiguity in Henry James' "The Turn of the Screw" [0.8528384027684192]
This work brings together computational text analysis and literary analysis to demonstrate the extent to which ambiguity in certain texts plays a key role in shaping meaning.
We revisit the discussion, well known in the humanities, about the role ambiguity plays in Henry James' 19th century novella, The Turn of the Screw.
We demonstrate that cosine similarity and word mover's distance are sensitive enough to detect ambiguity in its most subtle literary form.
arXiv Detail & Related papers (2020-11-21T17:53:41Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.