Interactive Ontology Matching with Cost-Efficient Learning
- URL: http://arxiv.org/abs/2404.07663v1
- Date: Thu, 11 Apr 2024 11:53:14 GMT
- Title: Interactive Ontology Matching with Cost-Efficient Learning
- Authors: Bin Cheng, Jonathan Fürst, Tobias Jacobs, Celia Garrido-Hidalgo,
- Abstract summary: This work introduces DualLoop, an active learning method tailored to matching.
Compared to existing active learning methods, we consistently achieved better F1 scores and recall.
We report our operational performance results within the Architecture, Engineering, Construction (AEC) industry sector.
- Score: 2.006461411375746
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The creation of high-quality ontologies is crucial for data integration and knowledge-based reasoning, specifically in the context of the rising data economy. However, automatic ontology matchers are often bound to the heuristics they are based on, leaving many matches unidentified. Interactive ontology matching systems involving human experts have been introduced, but they do not solve the fundamental issue of flexibly finding additional matches outside the scope of the implemented heuristics, even though this is highly demanded in industrial settings. Active machine learning methods appear to be a promising path towards a flexible interactive ontology matcher. However, off-the-shelf active learning mechanisms suffer from low query efficiency due to extreme class imbalance, resulting in a last-mile problem where high human effort is required to identify the remaining matches. To address the last-mile problem, this work introduces DualLoop, an active learning method tailored to ontology matching. DualLoop offers three main contributions: (1) an ensemble of tunable heuristic matchers, (2) a short-term learner with a novel query strategy adapted to highly imbalanced data, and (3) long-term learners to explore potential matches by creating and tuning new heuristics. We evaluated DualLoop on three datasets of varying sizes and domains. Compared to existing active learning methods, we consistently achieved better F1 scores and recall, reducing the expected query cost spent on finding 90% of all matches by over 50%. Compared to traditional interactive ontology matchers, we are able to find additional, last-mile matches. Finally, we detail the successful deployment of our approach within an actual product and report its operational performance results within the Architecture, Engineering, and Construction (AEC) industry sector, showcasing its practical value and efficiency.
Related papers
- Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations [13.119576365743624]
We study the dynamics of feature learning under spurious correlations.
We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation.
We also identify limitations of popular debiasing algorithms that exploit early learning of spurious features.
arXiv Detail & Related papers (2024-03-05T23:54:00Z) - Beyond Two-Tower Matching: Learning Sparse Retrievable
Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications.
It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving.
We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z) - The Battleship Approach to the Low Resource Entity Matching Problem [0.0]
We propose a new active learning approach for entity matching problems.
We focus on a selection mechanism that exploits unique properties of entity matching.
An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
arXiv Detail & Related papers (2023-11-27T10:18:17Z) - Unveiling the Limits of Learned Local Search Heuristics: Are You the
Mightiest of the Meek? [14.195843311387591]
We show that a simple learned based on Tabu Search surpasses state-of-the-art learneds in terms of performance and generalizability.
Our findings challenge prevailing assumptions and open up exciting avenues for future research.
arXiv Detail & Related papers (2023-10-30T20:16:42Z) - Actively Discovering New Slots for Task-oriented Conversation [19.815466126158785]
We propose a general new slot task in an information extraction fashion to realize human-in-the-loop learning.
We leverage existing language tools to extract value candidates where the corresponding labels are leveraged as weak supervision signals.
We conduct extensive experiments on several public datasets and compare with a bunch of competitive baselines to demonstrate our method.
arXiv Detail & Related papers (2023-05-06T13:33:33Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration [72.88493072196094]
We present a new synthesis approach that leverages learning to guide a bottom-up search over programs.
In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a set of input-output examples.
We show that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches.
arXiv Detail & Related papers (2020-07-28T17:46:18Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.