ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument
Aggregation
- URL: http://arxiv.org/abs/2106.10862v1
- Date: Mon, 21 Jun 2021 05:21:27 GMT
- Title: ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument
Aggregation
- Authors: Debanjana Kar, Sudeshna Sarkar, Pawan Goyal
- Abstract summary: We introduce the task of Information Aggregation or Argument Aggregation.
Our aim is to filter irrelevant and redundant argument mentions that were extracted at a sentence level and render a document level information frame.
We present an extractive algorithm with multiple sieves which adopts active learning strategies to work efficiently in low-resource settings.
- Score: 9.56216681584111
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Most of the existing information extraction frameworks (Wadden et al., 2019;
Veysehet al., 2020) focus on sentence-level tasks and are hardly able to
capture the consolidated information from a given document. In our endeavour to
generate precise document-level information frames from lengthy textual
records, we introduce the task of Information Aggregation or Argument
Aggregation. More specifically, our aim is to filter irrelevant and redundant
argument mentions that were extracted at a sentence level and render a document
level information frame. Majority of the existing works have been observed to
resolve related tasks of document-level event argument extraction (Yang et al.,
2018a; Zheng et al., 2019a) and salient entity identification (Jain et
al.,2020) using supervised techniques. To remove dependency from large amounts
of labelled data, we explore the task of information aggregation using
weakly-supervised techniques. In particular, we present an extractive algorithm
with multiple sieves which adopts active learning strategies to work
efficiently in low-resource settings. For this task, we have annotated our own
test dataset comprising of 131 document information frames and have released
the code and dataset to further research prospects in this new domain. To the
best of our knowledge, we are the first to establish baseline results for this
task in English. Our data and code are publicly available at
https://github.com/DebanjanaKar/ArgFuse.
Related papers
- GraphKD: Exploring Knowledge Distillation Towards Document Object
Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process.
We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Knowledge Combination to Learn Rotated Detection Without Rotated
Annotation [53.439096583978504]
Rotated bounding boxes drastically reduce output ambiguity of elongated objects.
Despite the effectiveness, rotated detectors are not widely employed.
We propose a framework that allows the model to predict precise rotated boxes.
arXiv Detail & Related papers (2023-04-05T03:07:36Z) - Dynamic Global Memory for Document-level Argument Extraction [63.314514124716936]
We introduce a new global neural generation-based framework for document-level event argument extraction.
We use a document memory store to record the contextual event information and leverage it to implicitly and explicitly help with decoding of arguments for later events.
Empirical results show that our framework outperforms prior methods substantially.
arXiv Detail & Related papers (2022-09-18T23:45:25Z) - OPAD: An Optimized Policy-based Active Learning Framework for Document
Content Analysis [6.159771892460152]
We propose textitOPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents.
The framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics.
We show superior performance of the proposed textitOPAD framework for active learning for various tasks related to document understanding.
arXiv Detail & Related papers (2021-10-01T07:40:56Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - SciREX: A Challenge Dataset for Document-Level Information Extraction [56.83748634747753]
It is challenging to create a large-scale information extraction dataset at the document level.
We introduce SciREX, a document level IE dataset that encompasses multiple IE tasks.
We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE.
arXiv Detail & Related papers (2020-05-01T17:30:10Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.