On the Multi-Property Extraction and Beyond
- URL: http://arxiv.org/abs/2006.08281v1
- Date: Mon, 15 Jun 2020 11:07:52 GMT
- Title: On the Multi-Property Extraction and Beyond
- Authors: Tomasz Dwojak and Micha{\l} Pietruszka and {\L}ukasz Borchmann and
Filip Grali\'nski and Jakub Ch{\l}\k{e}dowski
- Abstract summary: We investigate the Dual-source Transformer architecture on the WikiReading information extraction and machine reading comprehension dataset.
We introduce WikiReading Recycled - a newly developed public dataset, supporting the task of multiple property extraction.
- Score: 7.670897251425096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the Dual-source Transformer architecture on the
WikiReading information extraction and machine reading comprehension dataset.
The proposed model outperforms the current state-of-the-art by a large margin.
Next, we introduce WikiReading Recycled - a newly developed public dataset,
supporting the task of multiple property extraction. It keeps the spirit of the
original WikiReading but does not inherit the identified disadvantages of its
predecessor.
Related papers
- Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - A Computational Analysis of Vagueness in Revisions of Instructional
Texts [2.2577978123177536]
We extract pairwise versions of an instruction before and after a revision was made.
We investigate the ability of a neural model to distinguish between two versions of an instruction in our data.
arXiv Detail & Related papers (2023-09-21T14:26:04Z) - Interactive Distillation of Large Single-Topic Corpora of Scientific
Papers [1.2954493726326113]
A more robust but time-consuming approach is to build the dataset constructively in which a subject matter expert handpicks documents.
Here we showcase a new tool, based on machine learning, for constructively generating targeted datasets of scientific literature.
arXiv Detail & Related papers (2023-09-19T17:18:36Z) - Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - FRUIT: Faithfully Reflecting Updated Information in Text [106.40177769765512]
We introduce the novel generation task of *faithfully reflecting updated information in text*(FRUIT)
Our analysis shows that developing models that can update articles faithfully requires new capabilities for neural generation models.
arXiv Detail & Related papers (2021-12-16T05:21:24Z) - DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity
Descriptions [41.80938919728834]
We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description.
DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average.
The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities.
arXiv Detail & Related papers (2021-06-09T20:10:48Z) - WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z) - From Dataset Recycling to Multi-Property Extraction and Beyond [7.670897251425096]
This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading dataset.
The proposed dual-source model outperforms the current state-of-the-art by a large margin.
We introduce WikiReading Recycled-a newly developed public dataset and the task of multiple property extraction.
arXiv Detail & Related papers (2020-11-06T08:22:12Z) - SupMMD: A Sentence Importance Model for Extractive Summarization using
Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing.
We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z) - Concept Extraction Using Pointer-Generator Networks [86.75999352383535]
We propose a generic open-domain OOV-oriented extractive model that is based on distant supervision of a pointer-generator network.
The model has been trained on a large annotated corpus compiled specifically for this task from 250K Wikipedia pages.
arXiv Detail & Related papers (2020-08-25T22:28:14Z) - GameWikiSum: a Novel Large Multi-Document Summarization Dataset [39.38032088973816]
GameWikiSum is a new domain-specific dataset for multi-document summarization.
It is one hundred times larger than commonly used datasets, and in another domain than news.
We analyze the proposed dataset and show that both abstractive and extractive models can be trained on it.
arXiv Detail & Related papers (2020-02-17T09:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.