Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making
- URL: http://arxiv.org/abs/2106.04174v1
- Date: Tue, 8 Jun 2021 08:27:31 GMT
- Title: Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making
- Authors: Zijun Yao, Chengjiang Li, Tiansi Dong, Xin Lv, Jifan Yu, Lei Hou,
Juanzi Li, Yichi Zhang, Zelin Dai
- Abstract summary: Entity Matching aims at recognizing entity records that denote the same real-world object.
We propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction.
Our method is highly efficient and outperforms SOTA EM models in most cases.
- Score: 22.755892575582788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity Matching (EM) aims at recognizing entity records that denote the same
real-world object. Neural EM models learn vector representation of entity
descriptions and match entities end-to-end. Though robust, these methods
require many resources for training, and lack of interpretability. In this
paper, we propose a novel EM framework that consists of Heterogeneous
Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple
feature representation from matching decision. Using self-supervised learning
and mask mechanism in pre-trained language modeling, HIF learns the embeddings
of noisy attribute values by inter-attribute attention with unlabeled data.
Using a set of comparison features and a limited amount of annotated data, KAT
Induction learns an efficient decision tree that can be interpreted by
generating entity matching rules whose structure is advocated by domain
experts. Experiments on 6 public datasets and 3 industrial datasets show that
our method is highly efficient and outperforms SOTA EM models in most cases.
Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning [32.62763647036567]
Few-shot named entity recognition can identify new types of named entities based on a few labeled examples.
We propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER)
MsFNER splits the general NER into two stages: entity-span detection and entity classification.
arXiv Detail & Related papers (2024-04-10T12:31:09Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Two Heads Are Better Than One: Integrating Knowledge from Knowledge
Graphs and Large Language Models for Entity Alignment [31.70064035432789]
We propose a Large Language Model-enhanced Entity Alignment framework (LLMEA)
LLMEA identifies candidate alignments for a given entity by considering both embedding similarities between entities across Knowledge Graphs and edit distances to a virtual equivalent entity.
Experiments conducted on three public datasets reveal that LLMEA surpasses leading baseline models.
arXiv Detail & Related papers (2024-01-30T12:41:04Z) - EchoEA: Echo Information between Entities and Relations for Entity
Alignment [1.1470070927586016]
We propose a novel framework, Echo Entity Alignment (EchoEA), which leverages self-attention mechanism to spread entity information to relations and echo back to entities.
The experimental results on three real-world cross-lingual datasets are stable at around 96% at hits@1 on average.
arXiv Detail & Related papers (2021-07-07T07:34:21Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Exploring and Evaluating Attributes, Values, and Structures for Entity
Alignment [100.19568734815732]
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs.
attribute triples can also provide crucial alignment signal but have not been well explored yet.
We propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently.
arXiv Detail & Related papers (2020-10-07T08:03:58Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.