Towards Generalized Open Information Extraction
- URL: http://arxiv.org/abs/2211.15987v1
- Date: Tue, 29 Nov 2022 07:33:44 GMT
- Title: Towards Generalized Open Information Extraction
- Authors: Bowen Yu, Zhenyu Zhang, Jingyang Li, Haiyang Yu, Tingwen Liu, Jian
Sun, Yongbin Li, Bin Wang
- Abstract summary: We propose a more realistic scenario: generalizing over unseen target domains with different data distributions from the source training domains.
DragonIE beats the previous methods in both in-domain and out-of-domain settings by as much as 6.4% in F1 score absolutely.
- Score: 74.20080376460947
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Open Information Extraction (OpenIE) facilitates the open-domain discovery of
textual facts. However, the prevailing solutions evaluate OpenIE models on
in-domain test sets aside from the training corpus, which certainly violates
the initial task principle of domain-independence. In this paper, we propose to
advance OpenIE towards a more realistic scenario: generalizing over unseen
target domains with different data distributions from the source training
domains, termed Generalized OpenIE. For this purpose, we first introduce GLOBE,
a large-scale human-annotated multi-domain OpenIE benchmark, to examine the
robustness of recent OpenIE models to domain shifts, and the relative
performance degradation of up to 70% implies the challenges of generalized
OpenIE. Then, we propose DragonIE, which explores a minimalist graph expression
of textual fact: directed acyclic graph, to improve the OpenIE generalization.
Extensive experiments demonstrate that DragonIE beats the previous methods in
both in-domain and out-of-domain settings by as much as 6.0% in F1 score
absolutely, but there is still ample room for improvement.
Related papers
- A Survey on Open Information Extraction from Rule-based Model to Large Language Model [29.017823043117144]
Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text.
This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective.
The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework.
arXiv Detail & Related papers (2022-08-18T08:03:45Z) - A Survey on Neural Open Information Extraction: Current Status and
Future Directions [87.30702606041407]
Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora.
We provide an overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness.
arXiv Detail & Related papers (2022-05-24T02:24:55Z) - Challenges for Open-domain Targeted Sentiment Analysis [21.61943346030794]
We propose a novel dataset consisting of 6,013 human-labeled data to extend the data domains in topics of interest and document level.
We also offer a nested target annotation schema to extract the complete sentiment information in documents.
arXiv Detail & Related papers (2022-04-14T11:44:02Z) - META: Mimicking Embedding via oThers' Aggregation for Generalizable
Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time.
This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z) - OpenGAN: Open-Set Recognition via Open Data Generation [76.00714592984552]
Real-world machine learning systems need to analyze novel testing data that differs from the training data.
Two conceptually elegant ideas for open-set discrimination are: 1) discriminatively learning an open-vs-closed binary discriminator, and 2) unsupervised learning the closed-set data distribution with a GAN.
We propose OpenGAN, which addresses the limitation of each approach by combining them with several technical insights.
arXiv Detail & Related papers (2021-04-07T06:19:24Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z) - OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open
Information Extraction [36.439047786561396]
We present an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster.
This is achieved through a novel Iterative Grid Labeling (IGL) architecture, which treats OpenIE as a 2-D grid labeling task.
Our OpenIE system, OpenIE6, beats the previous systems by as much as 4 pts in F1, while being much faster.
arXiv Detail & Related papers (2020-10-07T04:05:37Z) - Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation [65.38975706997088]
Open set domain adaptation (OSDA) assumes the presence of unknown classes in the target domain.
We show that existing state-of-the-art methods suffer a considerable performance drop in the presence of larger domain gaps.
We propose a novel framework to specifically address the larger domain gaps.
arXiv Detail & Related papers (2020-03-08T14:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.