Heterogeneous Entity Matching with Complex Attribute Associations using
BERT and Neural Networks
- URL: http://arxiv.org/abs/2309.11046v1
- Date: Wed, 20 Sep 2023 03:49:57 GMT
- Title: Heterogeneous Entity Matching with Complex Attribute Associations using
BERT and Neural Networks
- Authors: Shitao Wang and Jiamin Lu
- Abstract summary: We introduce a novel entity matching model, dubbed Entity Matching Model for Capturing Complex Attribute Relationships(EMM-CCAR)
Specifically, this model transforms the matching task into a sequence matching problem to mitigate the impact of varying data formats.
In comparison with the DER-SSM and Ditto approaches, our model improvements of approximately 4% and 1% in F1 scores, respectively.
- Score: 0.7252027234425334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Across various domains, data from different sources such as Baidu Baike and
Wikipedia often manifest in distinct forms. Current entity matching
methodologies predominantly focus on homogeneous data, characterized by
attributes that share the same structure and concise attribute values. However,
this orientation poses challenges in handling data with diverse formats.
Moreover, prevailing approaches aggregate the similarity of attribute values
between corresponding attributes to ascertain entity similarity. Yet, they
often overlook the intricate interrelationships between attributes, where one
attribute may have multiple associations. The simplistic approach of pairwise
attribute comparison fails to harness the wealth of information encapsulated
within entities.To address these challenges, we introduce a novel entity
matching model, dubbed Entity Matching Model for Capturing Complex Attribute
Relationships(EMM-CCAR),built upon pre-trained models. Specifically, this model
transforms the matching task into a sequence matching problem to mitigate the
impact of varying data formats. Moreover, by introducing attention mechanisms,
it identifies complex relationships between attributes, emphasizing the degree
of matching among multiple attributes rather than one-to-one correspondences.
Through the integration of the EMM-CCAR model, we adeptly surmount the
challenges posed by data heterogeneity and intricate attribute
interdependencies. In comparison with the prevalent DER-SSM and Ditto
approaches, our model achieves improvements of approximately 4% and 1% in F1
scores, respectively. This furnishes a robust solution for addressing the
intricacies of attribute complexity in entity matching.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - A Solution to Co-occurrence Bias: Attributes Disentanglement via Mutual
Information Minimization for Pedestrian Attribute Recognition [10.821982414387525]
We show that current methods can actually suffer in generalizing such fitted attributes interdependencies onto scenes or identities off the dataset distribution.
To render models robust in realistic scenes, we propose the attributes-disentangled feature learning to ensure the recognition of an attribute not inferring on the existence of others.
arXiv Detail & Related papers (2023-07-28T01:34:55Z) - Attribute-Consistent Knowledge Graph Representation Learning for
Multi-Modal Entity Alignment [14.658282035561792]
We propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA)
Our approach achieves excellent performance compared to its competitors.
arXiv Detail & Related papers (2023-04-04T06:39:36Z) - Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making [22.755892575582788]
Entity Matching aims at recognizing entity records that denote the same real-world object.
We propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction.
Our method is highly efficient and outperforms SOTA EM models in most cases.
arXiv Detail & Related papers (2021-06-08T08:27:31Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - AdaTag: Multi-Attribute Value Extraction from Product Profiles with
Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z) - Neural Graph Matching based Collaborative Filtering [13.086302251856756]
We identify two different types of attribute interactions, inner and cross interactions.
Existing models do not distinguish these two types of attribute interactions.
We propose a neural Graph Matching based Collaborative Filtering model (GMCF)
Our model outperforms state-of-the-art models.
arXiv Detail & Related papers (2021-05-10T01:51:46Z) - AttriMeter: An Attribute-guided Metric Interpreter for Person
Re-Identification [100.3112429685558]
Person ReID systems only provide a distance or similarity when matching two persons.
We propose an Attribute-guided Metric Interpreter, named AttriMeter, to semantically and quantitatively explain the results of CNN-based ReID models.
arXiv Detail & Related papers (2021-03-02T03:37:48Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z) - Hierarchical Feature Embedding for Attribute Recognition [26.79901907956084]
We propose a hierarchical feature embedding framework, which learns a fine-grained feature embedding by combining attribute and ID information.
Experiments show that our method achieves the state-of-the-art results on two pedestrian attribute datasets and a facial attribute dataset.
arXiv Detail & Related papers (2020-05-23T17:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.