Few-Shot Document-Level Event Argument Extraction
- URL: http://arxiv.org/abs/2209.02203v2
- Date: Thu, 25 May 2023 21:18:42 GMT
- Title: Few-Shot Document-Level Event Argument Extraction
- Authors: Xianjun Yang, Yujie Lu, Linda Petzold
- Abstract summary: Event argument extraction (EAE) has been well studied at the sentence level but under-explored at the document level.
We present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark.
- Score: 2.680014762694412
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event argument extraction (EAE) has been well studied at the sentence level
but under-explored at the document level. In this paper, we study to capture
event arguments that actually spread across sentences in documents. Prior works
usually assume full access to rich document supervision, ignoring the fact that
the available argument annotation is usually limited. To fill this gap, we
present FewDocAE, a Few-Shot Document-Level Event Argument Extraction
benchmark, based on the existing document-level event extraction dataset. We
first define the new problem and reconstruct the corpus by a novel N -Way-D-Doc
sampling instead of the traditional N -Way-K-Shot strategy. Then we adjust the
current document-level neural models into the few-shot setting to provide
baseline results under in- and cross-domain settings. Since the argument
extraction depends on the context from multiple sentences and the learning
process is limited to very few examples, we find this novel task to be very
challenging with substantively low performance. Considering FewDocAE is closely
related to practical use under low-resource regimes, we hope this benchmark
encourages more research in this direction. Our data and codes will be
available online.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - In-context Pretraining: Language Modeling Beyond Document Boundaries [137.53145699439898]
In-Context Pretraining is a new approach where language models are pretrained on a sequence of related documents.
We introduce approximate algorithms for finding related documents with efficient nearest neighbor search.
We see notable improvements in tasks that require more complex contextual reasoning.
arXiv Detail & Related papers (2023-10-16T17:57:12Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Information Extraction from Documents: Question Answering vs Token
Classification in real-world setups [0.0]
We compare the Question Answering approach with the classical token classification approach for document key information extraction.
Our research showed that when dealing with clean and relatively short entities, it is still best to use token classification-based approach.
arXiv Detail & Related papers (2023-04-21T14:43:42Z) - Dynamic Global Memory for Document-level Argument Extraction [63.314514124716936]
We introduce a new global neural generation-based framework for document-level event argument extraction.
We use a document memory store to record the contextual event information and leverage it to implicitly and explicitly help with decoding of arguments for later events.
Empirical results show that our framework outperforms prior methods substantially.
arXiv Detail & Related papers (2022-09-18T23:45:25Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - Document-Level Event Role Filler Extraction using Multi-Granularity
Contextualized Encoding [40.13163091122463]
Event extraction is a difficult task since it requires a view of a larger context to determine which spans of text correspond to event role fillers.
We first investigate how end-to-end neural sequence models perform on document-level role filler extraction.
We show that our best system performs substantially better than prior work.
arXiv Detail & Related papers (2020-05-13T20:42:17Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.