The Battleship Approach to the Low Resource Entity Matching Problem
- URL: http://arxiv.org/abs/2311.15685v1
- Date: Mon, 27 Nov 2023 10:18:17 GMT
- Title: The Battleship Approach to the Low Resource Entity Matching Problem
- Authors: Bar Genossar (1), Avigdor Gal (1) and Roee Shraga (2) ((1) Technion -
Israel Institute of Technology, (2) Worcester Polytechnic Institute)
- Abstract summary: We propose a new active learning approach for entity matching problems.
We focus on a selection mechanism that exploits unique properties of entity matching.
An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity matching, a core data integration problem, is the task of deciding
whether two data tuples refer to the same real-world entity. Recent advances in
deep learning methods, using pre-trained language models, were proposed for
resolving entity matching. Although demonstrating unprecedented results, these
solutions suffer from a major drawback as they require large amounts of labeled
data for training, and, as such, are inadequate to be applied to low resource
entity matching problems. To overcome the challenge of obtaining sufficient
labeled data we offer a new active learning approach, focusing on a selection
mechanism that exploits unique properties of entity matching. We argue that a
distributed representation of a tuple pair indicates its informativeness when
considered among other pairs. This is used consequently in our approach that
iteratively utilizes space-aware considerations. Bringing it all together, we
treat the low resource entity matching problem as a Battleship game, hunting
indicative samples, focusing on positive ones, through awareness of the latent
space along with careful planning of next sampling iterations. An extensive
experimental analysis shows that the proposed algorithm outperforms
state-of-the-art active learning solutions to low resource entity matching, and
although using less samples, can be as successful as state-of-the-art fully
trained known algorithms.
Related papers
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Batch Active Learning from the Perspective of Sparse Approximation [12.51958241746014]
Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators.
We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective.
Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart.
arXiv Detail & Related papers (2022-11-01T03:20:28Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Byzantine Resilient Distributed Multi-Task Learning [6.850757447639822]
We show that distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents.
We propose an approach for Byzantine resilient distributed multi-task learning.
arXiv Detail & Related papers (2020-10-25T04:32:52Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.