Related papers: CLARITY -- Comparing heterogeneous data using dissimiLARITY

CLARITY -- Comparing heterogeneous data using dissimiLARITY

URL: http://arxiv.org/abs/2006.00077v2
Date: Thu, 2 Dec 2021 11:36:38 GMT
Title: CLARITY -- Comparing heterogeneous data using dissimiLARITY
Authors: Daniel J. Lawson, Vinesh Solanki, Igor Yanovich, Johannes Dellert, Damian Ruck and Phillip Endicott
Abstract summary: Many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation.
Score: 0.39146761527401414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation vs expression, evolution of language sounds vs word use, and country-level economic metrics vs cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a `structural' component analogous to a clustering, and an underlying `relationship' between those structures. This allows a `structural comparison' between two similarity matrices using their predictability from `structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from https://github.com/danjlawson/CLARITY.

Related papers

Full Triple Matcher: Integrating all triple elements between heterogeneous Knowledge Graphs [0.09471093245585005]
Knowledge graphs (KGs) are powerful tools for representing and reasoning over structured information.<n>Current approaches may fall short in scenarios where diverse and complex contexts need to be integrated.<n>We propose a novel KG integration method consisting of label matching and triple matching.
arXiv Detail & Related papers (2025-07-20T07:46:55Z)
CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors. Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation. We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z)
Measuring similarity between embedding spaces using induced neighborhood graphs [10.056989400384772]
We propose a metric to evaluate the similarity between paired item representations. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity.
arXiv Detail & Related papers (2024-11-13T15:22:33Z)
SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment [13.487673375206276]
This paper presents a soft label propagation framework that integrates multi-source data and iterative seed enhancement. A bidirectional weighted joint loss function is implemented, which reduces the distance between positive samples and differentially processes negative samples. Our method outperforms existing semi-supervised approaches, as evidenced by superior results on multiple datasets.
arXiv Detail & Related papers (2024-10-28T04:50:46Z)
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning [43.75697355156703]
Noisy correspondence is prevalent on human-annotated or web-crawled datasets. We introduce a Geometrical Structure Consistency (GSC) method to infer the true correspondence.
arXiv Detail & Related papers (2024-05-27T09:42:52Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z)
Commutative Lie Group VAE for Disentanglement Learning [96.32813624341833]
We view disentanglement learning as discovering an underlying structure that equivariantly reflects the factorized variations shown in data. A simple model named Commutative Lie Group VAE is introduced to realize the group-based disentanglement learning. Experiments show that our model can effectively learn disentangled representations without supervision, and can achieve state-of-the-art performance without extra constraints.
arXiv Detail & Related papers (2021-06-07T07:03:14Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
Comparative analysis of word embeddings in assessing semantic similarity of complex sentences [8.873705500708196]
We study the sentences in existing benchmark datasets and analyze the sensitivity of various word embeddings with respect to the complexity of the sentences. The results show the increase in complexity of the sentences has a significant impact on the performance of the embedding models.
arXiv Detail & Related papers (2020-10-23T19:55:11Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
A Bayesian Hierarchical Score for Structure Learning from Related Data Sets [0.7240563090941907]
We propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD) BHD is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure. We find that BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance.
arXiv Detail & Related papers (2020-08-04T16:41:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.