Disambiguate Entity Matching using Large Language Models through Relation Discovery
- URL: http://arxiv.org/abs/2403.17344v2
- Date: Wed, 29 May 2024 14:40:55 GMT
- Title: Disambiguate Entity Matching using Large Language Models through Relation Discovery
- Authors: Zezhou Huang,
- Abstract summary: We propose a novel approach that shifts focus from purely semantic similarities to understanding and defining the "relations" between entities.
By predefining a set of relations relevant to the task at hand, our method allows analysts to navigate the spectrum of similarity more effectively.
- Score: 1.6317061277457001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity matching is a critical challenge in data integration and cleaning, central to tasks like fuzzy joins and deduplication. Traditional approaches have focused on overcoming fuzzy term representations through methods such as edit distance, Jaccard similarity, and more recently, embeddings and deep neural networks, including advancements from large language models (LLMs) like GPT. However, the core challenge in entity matching extends beyond term fuzziness to the ambiguity in defining what constitutes a "match," especially when integrating with external databases. This ambiguity arises due to varying levels of detail and granularity among entities, complicating exact matches. We propose a novel approach that shifts focus from purely identifying semantic similarities to understanding and defining the "relations" between entities as crucial for resolving ambiguities in matching. By predefining a set of relations relevant to the task at hand, our method allows analysts to navigate the spectrum of similarity more effectively, from exact matches to conceptually related entities.
Related papers
- SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment [13.487673375206276]
This paper presents a soft label propagation framework that integrates multi-source data and iterative seed enhancement.
A bidirectional weighted joint loss function is implemented, which reduces the distance between positive samples and differentially processes negative samples.
Our method outperforms existing semi-supervised approaches, as evidenced by superior results on multiple datasets.
arXiv Detail & Related papers (2024-10-28T04:50:46Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Contrastive Video-Language Segmentation [41.1635597261304]
We focus on the problem of segmenting a certain object referred by a natural language sentence in video content.
We propose to interwind the visual and linguistic modalities in an explicit way via the contrastive learning objective.
arXiv Detail & Related papers (2021-09-29T01:40:58Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - End-to-End Hierarchical Relation Extraction for Generic Form
Understanding [0.6299766708197884]
We present a novel deep neural network to jointly perform both entity detection and link prediction.
Our model extends the Multi-stage Attentional U-Net architecture with the Part-Intensity Fields and Part-Association Fields for link prediction.
We demonstrate the effectiveness of the model on the Form Understanding in Noisy Scanned Documents dataset.
arXiv Detail & Related papers (2021-06-02T06:51:35Z) - Exploiting Transitivity Constraints for Entity Matching in Knowledge
Graphs [1.7080853582489066]
We show that an ad-hoc enforcement of transitivity on identified set of entity pairs may decrease precision dramatically.
We propose a methodology that starts with a given similarity measure, generates a set of entity pairs that are identified as referring to the same real-world objects, and applies the cluster editing algorithm to enforce transitivity without adding many spurious links.
arXiv Detail & Related papers (2021-04-22T10:57:01Z) - Learning to Decouple Relations: Few-Shot Relation Classification with
Entity-Guided Attention and Confusion-Aware Training [49.9995628166064]
We propose CTEG, a model equipped with two mechanisms to learn to decouple easily-confused relations.
On the one hand, an EGA mechanism is introduced to guide the attention to filter out information causing confusion.
On the other hand, a Confusion-Aware Training (CAT) method is proposed to explicitly learn to distinguish relations.
arXiv Detail & Related papers (2020-10-21T11:07:53Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.