Explaining Relationships Between Scientific Documents
- URL: http://arxiv.org/abs/2002.00317v3
- Date: Thu, 12 Aug 2021 23:28:11 GMT
- Title: Explaining Relationships Between Scientific Documents
- Authors: Kelvin Luu, Xinyi Wu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola,
and Noah A. Smith
- Abstract summary: We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
- Score: 55.23390424044378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the task of explaining relationships between two scientific
documents using natural language text. This task requires modeling the complex
content of long technical documents, deducing a relationship between these
documents, and expressing the details of that relationship in text. In addition
to the theoretical interest of this task, successful solutions can help improve
researcher efficiency in search and review. In this paper we establish a
dataset of 622K examples from 154K documents. We pretrain a large language
model to serve as the foundation for autoregressive approaches to the task. We
explore the impact of taking different views on the two documents, including
the use of dense representations extracted with scientific IE systems. We
provide extensive automatic and human evaluations which show the promise of
such models, but make clear challenges for future work.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for
Interdisciplinary Science [0.0]
Large language models record impressive performance on many natural language processing tasks.
Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources.
We propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation.
arXiv Detail & Related papers (2023-11-21T02:02:46Z) - Retrieval-Generation Synergy Augmented Large Language Models [30.53260173572783]
We propose an iterative retrieval-generation collaborative framework.
We conduct experiments on four question answering datasets, including single-hop QA and multi-hop QA tasks.
arXiv Detail & Related papers (2023-10-08T12:50:57Z) - A Comprehensive Survey of Document-level Relation Extraction (2016-2023) [3.0204640945657326]
Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP)
This paper aims to provide a comprehensive overview of recent advances in this field, highlighting its different applications in comparison to sentence-level relation extraction.
arXiv Detail & Related papers (2023-09-28T12:43:32Z) - UniDoc: A Universal Large Multimodal Model for Simultaneous Text
Detection, Recognition, Spotting and Understanding [93.92313947913831]
We introduce UniDoc, a novel multimodal model equipped with text detection and recognition capabilities.
To the best of our knowledge, this is the first large multimodal model capable of simultaneous text detection, recognition, spotting, and understanding.
arXiv Detail & Related papers (2023-08-19T17:32:34Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - Multi-Vector Models with Textual Guidance for Fine-Grained Scientific
Document Similarity [11.157086694203201]
We present a new scientific document similarity model based on matching fine-grained aspects.
Our model is trained using co-citation contexts that describe related paper aspects as a novel form of textual supervision.
arXiv Detail & Related papers (2021-11-16T11:12:30Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Reasoning with Latent Structure Refinement for Document-Level Relation
Extraction [20.308845516900426]
We propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph.
Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED)
arXiv Detail & Related papers (2020-05-13T13:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.