Cost-effective Variational Active Entity Resolution
- URL: http://arxiv.org/abs/2011.10406v3
- Date: Fri, 26 Feb 2021 11:13:27 GMT
- Title: Cost-effective Variational Active Entity Resolution
- Authors: Alex Bogatu, Norman W. Paton, Mark Douthwaite, Stuart Davie, Andre
Freitas
- Abstract summary: We devise an entity resolution method that builds on the robustness conferred by deep autoencoders to reduce human-involvement costs.
Specifically, we reduce the cost of training deep entity resolution models by performing unsupervised representation learning.
Finally, we reduce the cost of labelling training data through an active learning approach that builds on the properties conferred by the use of deep autoencoders.
- Score: 4.238343046459798
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurately identifying different representations of the same real-world
entity is an integral part of data cleaning and many methods have been proposed
to accomplish it. The challenges of this entity resolution task that demand so
much research attention are often rooted in the task-specificity and
user-dependence of the process. Adopting deep learning techniques has the
potential to lessen these challenges. In this paper, we set out to devise an
entity resolution method that builds on the robustness conferred by deep
autoencoders to reduce human-involvement costs. Specifically, we reduce the
cost of training deep entity resolution models by performing unsupervised
representation learning. This unveils a transferability property of the
resulting model that can further reduce the cost of applying the approach to
new datasets by means of transfer learning. Finally, we reduce the cost of
labelling training data through an active learning approach that builds on the
properties conferred by the use of deep autoencoders. Empirical evaluation
confirms the accomplishment of our cost-reduction desideratum while achieving
comparable effectiveness with state-of-the-art alternatives.
Related papers
- LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named
Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task.
Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z) - Compute-Efficient Active Learning [0.0]
Active learning aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset.
Traditional active learning process often demands extensive computational resources, hindering scalability and efficiency.
We present a novel method designed to alleviate the computational burden associated with active learning on massive datasets.
arXiv Detail & Related papers (2024-01-15T12:32:07Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Rethinking Cost-sensitive Classification in Deep Learning via
Adversarial Data Augmentation [4.479834103607382]
Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost.
This paper proposes a cost-sensitive adversarial data augmentation framework to make over- parameterized models cost-sensitive.
Our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
arXiv Detail & Related papers (2022-08-24T19:00:30Z) - Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient.
We propose a model-based reinforcement learning framework that learns an active feature acquisition policy.
Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z) - Boosting Active Learning for Speech Recognition with Noisy
Pseudo-labeled Samples [14.472052505918045]
We present a new training pipeline boosting the conventional active learning approach.
We show that the proposed training pipeline can boost the efficacy of active learning approaches.
arXiv Detail & Related papers (2020-06-19T08:54:46Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.