IMoJIE: Iterative Memory-Based Joint Open Information Extraction
- URL: http://arxiv.org/abs/2005.08178v1
- Date: Sun, 17 May 2020 07:04:08 GMT
- Title: IMoJIE: Iterative Memory-Based Joint Open Information Extraction
- Authors: Keshav Kolluru, Samarth Aggarwal, Vipul Rathore, Mausam and Soumen
Chakrabarti
- Abstract summary: We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracteds.
IMoJIE outperforms CopyAttention by about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts.
- Score: 37.487044478970965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While traditional systems for Open Information Extraction were statistical
and rule-based, recently neural models have been introduced for the task. Our
work builds upon CopyAttention, a sequence generation OpenIE model (Cui et.
al., 2018). Our analysis reveals that CopyAttention produces a constant number
of extractions per sentence, and its extracted tuples often express redundant
information.
We present IMoJIE, an extension to CopyAttention, which produces the next
extraction conditioned on all previously extracted tuples. This approach
overcomes both shortcomings of CopyAttention, resulting in a variable number of
diverse extractions per sentence. We train IMoJIE on training data bootstrapped
from extractions of several non-neural systems, which have been automatically
filtered to reduce redundancy and noise. IMoJIE outperforms CopyAttention by
about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts, establishing a
new state of the art for the task.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Jointly Learning Span Extraction and Sequence Labeling for Information
Extraction from Business Documents [1.6249267147413522]
This paper introduces a new information extraction model for business documents.
It takes into account advantage of both span extraction and sequence labeling.
The model is trained end-to-end to jointly optimize the two tasks.
arXiv Detail & Related papers (2022-05-26T15:37:24Z) - Integrating diverse extraction pathways using iterative predictions for
Multilingual Open Information Extraction [11.344977846840747]
We propose a neural multilingual OpenIE system that iteratively extracts triples by conditioning extractions on different elements of the triple.
MiLIE outperforms SOTA systems on multiple languages ranging from Chinese to Galician thanks to it's ability of combining multiple extraction pathways.
arXiv Detail & Related papers (2021-10-15T15:19:11Z) - Document-level Entity-based Extraction as Template Generation [13.110360825201044]
We propose a generative framework for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE)
We first formulate them as a template generation problem, allowing models to efficiently capture cross-entity dependencies.
A novel cross-attention guided copy mechanism, TopK Copy, is incorporated into a pre-trained sequence-to-sequence model to enhance the capabilities of identifying key information.
arXiv Detail & Related papers (2021-09-10T14:18:22Z) - An Effective System for Multi-format Information Extraction [1.027461951217988]
The 2021 Language and Intelligence Challenge is designed to evaluate information extraction from different dimensions.
Here we describe our system for this multi-format information extraction competition task.
Our system ranks No.4 on the test set leader-board of this multi-format information extraction task.
arXiv Detail & Related papers (2021-08-16T08:25:17Z) - MemSum: Extractive Summarization of Long Documents using Multi-step
Episodic Markov Decision Processes [6.585259903186036]
We introduce MemSum, a reinforcement-learning-based extractive summarizer enriched at any given time step with information on the current extraction history.
Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task.
arXiv Detail & Related papers (2021-07-19T14:41:31Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Contrastive Triple Extraction with Generative Transformer [72.21467482853232]
We introduce a novel model, contrastive triple extraction with a generative transformer.
Specifically, we introduce a single shared transformer module for encoder-decoder-based generation.
To generate faithful results, we propose a novel triplet contrastive training object.
arXiv Detail & Related papers (2020-09-14T05:29:24Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - At Which Level Should We Extract? An Empirical Analysis on Extractive
Document Summarization [110.54963847339775]
We show that unnecessity and redundancy issues exist when extracting full sentences.
We propose extracting sub-sentential units based on the constituency parsing tree.
arXiv Detail & Related papers (2020-04-06T13:35:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.