Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction
- URL: http://arxiv.org/abs/2305.13981v2
- Date: Tue, 24 Oct 2023 06:03:23 GMT
- Title: Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction
- Authors: Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin
Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
- Abstract summary: We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
- Score: 50.62245481416744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The robustness to distribution changes ensures that NLP models can be
successfully applied in the realistic world, especially for information
extraction tasks. However, most prior evaluation benchmarks have been devoted
to validating pairwise matching correctness, ignoring the crucial measurement
of robustness. In this paper, we present the first benchmark that simulates the
evaluation of open information extraction models in the real world, where the
syntactic and expressive distributions under the same knowledge meaning may
drift variously. We design and annotate a large-scale testbed in which each
example is a knowledge-invariant clique that consists of sentences with
structured knowledge of the same meaning but with different syntactic and
expressive forms. By further elaborating the robustness metric, a model is
judged to be robust if its performance is consistently accurate on the overall
cliques. We perform experiments on typical models published in the last decade
as well as a popular large language model, the results show that the existing
successful models exhibit a frustrating degradation, with a maximum drop of
23.43 F1 score. Our resources and code are available at
https://github.com/qijimrc/ROBUST.
Related papers
- Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - OMNIINPUT: A Model-centric Evaluation Framework through Output
Distribution [31.00645110294068]
We propose a model-centric evaluation framework, OmniInput, to evaluate the quality of an AI/ML model's predictions on all possible inputs.
We employ an efficient sampler to obtain representative inputs and the output distribution of the trained model.
Our experiments demonstrate that OmniInput enables a more fine-grained comparison between models.
arXiv Detail & Related papers (2023-12-06T04:53:12Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Syntactically Robust Training on Partially-Observed Data for Open
Information Extraction [25.59133746149343]
Open Information Extraction models have shown promising results with sufficient supervision.
We propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution.
arXiv Detail & Related papers (2023-01-17T12:39:13Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake
News Detection [7.29381091750894]
We propose a novel transformer-based language model fine-tuning approach for these fake news detection.
First, the token vocabulary of individual model is expanded for the actual semantics of professional phrases.
Last, the predicted features extracted by universal language model RoBERTa and domain-specific model CT-BERT are fused by one multiple layer perception to integrate fine-grained and high-level specific representations.
arXiv Detail & Related papers (2021-01-14T09:05:42Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.