Deep Indexed Active Learning for Matching Heterogeneous Entity
Representations
- URL: http://arxiv.org/abs/2104.03986v1
- Date: Thu, 8 Apr 2021 18:00:19 GMT
- Title: Deep Indexed Active Learning for Matching Heterogeneous Entity
Representations
- Authors: Arjit Jain, Sunita Sarawagi, Prithviraj Sen
- Abstract summary: We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs.
Experiments on five benchmark datasets and a multilingual record matching dataset show the effectiveness of our approach in terms of precision, recall and running time.
- Score: 20.15233789156307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given two large lists of records, the task in entity resolution (ER) is to
find the pairs from the Cartesian product of the lists that correspond to the
same real world entity. Typically, passive learning methods on tasks like ER
require large amounts of labeled data to yield useful models. Active Learning
is a promising approach for ER in low resource settings. However, the search
space, to find informative samples for the user to label, grows quadratically
for instance-pair tasks making active learning hard to scale. Previous works,
in this setting, rely on hand-crafted predicates, pre-trained language model
embeddings, or rule learning to prune away unlikely pairs from the Cartesian
product. This blocking step can miss out on important regions in the product
space leading to low recall. We propose DIAL, a scalable active learning
approach that jointly learns embeddings to maximize recall for blocking and
accuracy for matching blocked pairs. DIAL uses an Index-By-Committee framework,
where each committee member learns representations based on powerful
transformer models. We highlight surprising differences between the matcher and
the blocker in the creation of the training data and the objective used to
train their parameters. Experiments on five benchmark datasets and a
multilingual record matching dataset show the effectiveness of our approach in
terms of precision, recall and running time. Code is available at
https://github.com/ArjitJ/DIAL
Related papers
- Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank [26.69630281310365]
Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model.
We propose a Contextual Dual Learning Algorithm with Listwise Distillation (CDLA-LD) to address both position bias and contextual bias.
arXiv Detail & Related papers (2024-08-19T09:13:52Z) - Hypergraph Enhanced Knowledge Tree Prompt Learning for Next-Basket
Recommendation [50.55786122323965]
Next-basket recommendation (NBR) aims to infer the items in the next basket given the corresponding basket sequence.
HEKP4NBR transforms the knowledge graph (KG) into prompts, namely Knowledge Tree Prompt (KTP), to help PLM encode the Out-Of-Vocabulary (OOV) item IDs.
A hypergraph convolutional module is designed to build a hypergraph based on item similarities measured by an MoE model from multiple aspects.
arXiv Detail & Related papers (2023-12-26T02:12:21Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Novel Batch Active Learning Approach and Its Application to Synthetic
Aperture Radar Datasets [7.381841249558068]
Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005.
We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling.
The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size.
arXiv Detail & Related papers (2023-07-19T23:25:21Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - A Comprehensive Benchmark Framework for Active Learning Methods in
Entity Matching [17.064993611446898]
In this paper, we build a unified active learning benchmark framework for EM.
The goal of the framework is to enable concrete guidelines for practitioners as to what active learning combinations will work well for EM.
Our framework also includes novel optimizations that improve the quality of the learned model by roughly 9% in terms of F1-score and reduce example selection latencies by up to 10x without affecting the quality of the model.
arXiv Detail & Related papers (2020-03-29T19:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.