On Event Individuation for Document-Level Information Extraction
- URL: http://arxiv.org/abs/2212.09702v3
- Date: Fri, 20 Oct 2023 21:26:29 GMT
- Title: On Event Individuation for Document-Level Information Extraction
- Authors: William Gantt, Reno Kriz, Yunmo Chen, Siddharth Vashishtha, Aaron
Steven White
- Abstract summary: We argue that the task demands definitive answers to thorny questions of event individuation.
We show that this raises concerns about the usefulness of template filling metrics, the quality of datasets for the task, and the ability of models to learn it.
- Score: 10.051706937866504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As information extraction (IE) systems have grown more adept at processing
whole documents, the classic task of template filling has seen renewed interest
as benchmark for document-level IE. In this position paper, we call into
question the suitability of template filling for this purpose. We argue that
the task demands definitive answers to thorny questions of event individuation
-- the problem of distinguishing distinct events -- about which even human
experts disagree. Through an annotation study and error analysis, we show that
this raises concerns about the usefulness of template filling metrics, the
quality of datasets for the task, and the ability of models to learn it.
Finally, we consider possible solutions.
Related papers
- Extracting Training Data from Document-Based VQA Models [67.1470112451617]
Vision-Language Models (VLMs) have made remarkable progress in document-based Visual Question Answering (i.e., responding to queries about the contents of an input document provided as an image)
We show these models can memorise responses for training samples and regurgitate them even when the relevant visual information has been removed.
This includes Personal Identifiable Information repeated once in the training set, indicating these models could divulge sensitive information and therefore pose a privacy risk.
arXiv Detail & Related papers (2024-07-11T17:44:41Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Fine-tuning and aligning question answering models for complex
information extraction tasks [0.8392546351624164]
extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document.
We show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features.
We deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.
arXiv Detail & Related papers (2023-09-26T10:02:21Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration
in Improving the Performance of Information Extraction [48.45550809455558]
We show how a proxy human-supervision on-the-fly (termed as InteractiveIE) can boost the performance of learning template based information extraction from documents.
Experiments on biomedical and legal documents, where obtaining training data is expensive, reveal encouraging trends of performance improvement using InteractiveIE over AI-only baseline.
arXiv Detail & Related papers (2023-05-24T02:53:22Z) - Learning to Ask for Data-Efficient Event Argument Extraction [23.106166629659405]
Event argument extraction (EAE) is an important task for information extraction to discover specific argument roles.
In this study, we cast EAE as a question-based cloze task and empirically analyze fixed discrete token template performance.
We propose a novel approach called "Learning to Ask," which can learn optimized question templates for EAE without human annotations.
arXiv Detail & Related papers (2021-10-01T15:22:37Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - A Review on Fact Extraction and Verification [19.373340472113703]
We study the fact checking problem, which aims to identify the veracity of a given claim.
We focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset.
This task is essential and can be the building block of applications such as fake news detection and medical claim verification.
arXiv Detail & Related papers (2020-10-06T20:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.