Related papers: Learning to Characterize Matching Experts

Learning to Characterize Matching Experts

URL: http://arxiv.org/abs/2012.01229v1
Date: Wed, 2 Dec 2020 14:16:38 GMT
Title: Learning to Characterize Matching Experts
Authors: Roee Shraga, Ofra Amir, Avigdor Gal
Abstract summary: We characterize human matching experts, those humans whose proposed correspondences can mostly be trusted to be valid. We show that our approach can improve matching results by filtering out inexpert matchers.
Score: 19.246576904646172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements. Matching problems were traditionally solved in a semi-automatic manner, with correspondences being generated by matching algorithms and outcomes subsequently validated by human experts. Human-in-the-loop data integration has been recently challenged by the introduction of big data and recent studies have analyzed obstacles to effective human matching and validation. In this work we characterize human matching experts, those humans whose proposed correspondences can mostly be trusted to be valid. We provide a novel framework for characterizing matching experts that, accompanied with a novel set of features, can be used to identify reliable and valuable human experts. We demonstrate the usefulness of our approach using an extensive empirical evaluation. In particular, we show that our approach can improve matching results by filtering out inexpert matchers.

Related papers

Learning to Defer for Causal Discovery with Imperfect Experts [59.071731337922664]
We propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. We evaluate L2D-CD on the canonical T"ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation.
arXiv Detail & Related papers (2025-02-18T18:55:53Z)
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem [11.763640675057076]
We develop a benchmark dataset for evaluating the reviewer assignment problem without needing explicit labels. We benchmark various methods, including traditional lexical matching, static neural embeddings, and contextualized neural embeddings. Our results indicate that while traditional methods perform reasonably well, contextualized embeddings trained on scholarly literature show the best performance.
arXiv Detail & Related papers (2025-02-11T16:35:04Z)
Leveraging Mixture of Experts for Improved Speech Deepfake Detection [53.69740463004446]
Speech deepfakes pose a significant threat to personal security and content authenticity. We introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.
arXiv Detail & Related papers (2024-09-24T13:24:03Z)
Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation. Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Personalized Decentralized Multi-Task Learning Over Dynamic Communication Graphs [59.96266198512243]
We propose a decentralized and federated learning algorithm for tasks that are positively and negatively correlated. Our algorithm uses gradients to calculate the correlations among tasks automatically, and dynamically adjusts the communication graph to connect mutually beneficial tasks and isolate those that may negatively impact each other. We conduct experiments on a synthetic Gaussian dataset and a large-scale celebrity attributes (CelebA) dataset.
arXiv Detail & Related papers (2022-12-21T18:58:24Z)
A Unified Comparison of User Modeling Techniques for Predicting Data Interaction and Detecting Exploration Bias [17.518601254380275]
We compare and rank eight user modeling algorithms based on their performance on a diverse set of four user study datasets. Based on our findings, we highlight open challenges and new directions for analyzing user interactions and visualization provenance.
arXiv Detail & Related papers (2022-08-09T19:51:10Z)
PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching [20.110234122423172]
We examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions. PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high quality matches.
arXiv Detail & Related papers (2021-09-15T14:24:56Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Novel Human-Object Interaction Detection via Adversarial Domain Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios. The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations. We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z)
Search Result Clustering in Collaborative Sound Collections [17.48516881308658]
We propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions.
arXiv Detail & Related papers (2020-04-08T13:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.