Optimising Human-Machine Collaboration for Efficient High-Precision
Information Extraction from Text Documents
- URL: http://arxiv.org/abs/2302.09324v1
- Date: Sat, 18 Feb 2023 13:07:22 GMT
- Title: Optimising Human-Machine Collaboration for Efficient High-Precision
Information Extraction from Text Documents
- Authors: Bradley Butcher, Miri Zilka, Darren Cook, Jiri Hron and Adrian Weller
- Abstract summary: We consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches.
We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation.
We find that the combination of computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time.
- Score: 23.278525774427607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While humans can extract information from unstructured text with high
precision and recall, this is often too time-consuming to be practical.
Automated approaches, on the other hand, produce nearly-immediate results, but
may not be reliable enough for high-stakes applications where precision is
essential. In this work, we consider the benefits and drawbacks of various
human-only, human-machine, and machine-only information extraction approaches.
We argue for the utility of a human-in-the-loop approach in applications where
high precision is required, but purely manual extraction is infeasible. We
present a framework and an accompanying tool for information extraction using
weak-supervision labelling with human validation. We demonstrate our approach
on three criminal justice datasets. We find that the combination of computer
speed and human understanding yields precision comparable to manual annotation
while requiring only a fraction of time, and significantly outperforms fully
automated baselines in terms of precision.
Related papers
- Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers [6.080064619880841]
Shortcut learning, i.e., a model's reliance on undesired features not directly relevant to the task, is a major challenge that severely limits the applications of machine learning algorithms.
We leverage recent advancements in machine learning to create an unsupervised framework that is capable of both detecting and mitigating shortcut learning in transformers.
arXiv Detail & Related papers (2025-01-01T19:52:19Z) - No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data [2.8769762836804538]
We present a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results.
We ask annotators simple questions with discrete answers, which can be highly automated using a convolutional neural network trained to predict crowd responses.
We demonstrate our approach on two challenging real-world automotive datasets, showing that our model can fully automate a significant portion of tasks.
arXiv Detail & Related papers (2024-08-19T14:45:50Z) - Abstractive Text Summarization using Attentive GRU based Encoder-Decoder [4.339043862780233]
Automatic text summarization has emerged as an important application of machine learning in text processing.
In this paper, an english text summarizer has been built with GRU-based encoder and decoder.
The output is observed to outperform competitive models in the literature.
arXiv Detail & Related papers (2023-02-25T16:45:46Z) - Localized Shortcut Removal [4.511561231517167]
High performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful.
This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand.
We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images.
arXiv Detail & Related papers (2022-11-24T13:05:33Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Addressing Data Scarcity in Multimodal User State Recognition by
Combining Semi-Supervised and Supervised Learning [1.1688030627514532]
We present a multimodal machine learning approach for detecting dis-/agreement and confusion states in a human-robot interaction environment.
We achieve an average F1-score of 81.1% for dis-/agreement detection with a small amount of labeled data and a large unlabeled data set.
arXiv Detail & Related papers (2022-02-08T10:41:41Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Skill Preferences: Learning to Extract and Execute Robotic Skills from
Human Feedback [82.96694147237113]
We present Skill Preferences, an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data.
We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-11T18:04:08Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z) - SideInfNet: A Deep Neural Network for Semi-Automatic Semantic
Segmentation with Side Information [83.03179580646324]
This paper proposes a novel deep neural network architecture, namely SideInfNet.
It integrates features learnt from images with side information extracted from user annotations.
To evaluate our method, we applied the proposed network to three semantic segmentation tasks and conducted extensive experiments on benchmark datasets.
arXiv Detail & Related papers (2020-02-07T06:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.