From raw affiliations to organization identifiers
- URL: http://arxiv.org/abs/2505.07577v2
- Date: Tue, 13 May 2025 09:27:19 GMT
- Title: From raw affiliations to organization identifiers
- Authors: Myrto Kallipoliti, Serafeim Chatzopoulos, Miriam Baglioni, Eleni Adamidi, Paris Koloveas, Thanasis Vergoulis,
- Abstract summary: Existing approaches fail to handle the complexity of affiliation strings that often include mentions of multiple organizations or extraneous information.<n>We present AffRo, a novel approach designed to address these challenges, leveraging advanced parsing and disambiguation techniques.<n>Results demonstrate the effectiveness of AffRp in accurately identifying organizations from complex affiliation strings.
- Score: 0.343054185715673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate affiliation matching, which links affiliation strings to standardized organization identifiers, is critical for improving research metadata quality, facilitating comprehensive bibliometric analyses, and supporting data interoperability across scholarly knowledge bases. Existing approaches fail to handle the complexity of affiliation strings that often include mentions of multiple organizations or extraneous information. In this paper, we present AffRo, a novel approach designed to address these challenges, leveraging advanced parsing and disambiguation techniques. We also introduce AffRoDB, an expert-curated dataset to systematically evaluate affiliation matching algorithms, ensuring robust benchmarking. Results demonstrate the effectiveness of AffRp in accurately identifying organizations from complex affiliation strings.
Related papers
- Full Triple Matcher: Integrating all triple elements between heterogeneous Knowledge Graphs [0.09471093245585005]
Knowledge graphs (KGs) are powerful tools for representing and reasoning over structured information.<n>Current approaches may fall short in scenarios where diverse and complex contexts need to be integrated.<n>We propose a novel KG integration method consisting of label matching and triple matching.
arXiv Detail & Related papers (2025-07-20T07:46:55Z) - Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts [0.4499833362998487]
We propose an agent-based framework to iteratively reconstruct dataset descriptions through interactions with employees.<n>Our results show that the agent achieves 94.9% full-knowledge recall, with self-critical feedback scores strongly correlating with external literature critic scores.<n>These findings highlight the agent's ability to navigate organizational complexity and capture fragmented knowledge that would otherwise remain inaccessible.
arXiv Detail & Related papers (2025-07-04T21:09:32Z) - Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z) - Cooperation of Experts: Fusing Heterogeneous Information with Large Margin [11.522412489437702]
Cooperation of Experts (CoE) framework encodes multi-typed information into unified heterogeneous multiplex networks.<n>In our framework, dedicated encoders act as domain-specific experts, each specializing in learning distinct relational patterns in specific semantic spaces.
arXiv Detail & Related papers (2025-05-27T08:04:32Z) - CORG: Generating Answers from Complex, Interrelated Contexts [57.213304718157985]
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors.<n>Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation.<n>We introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups.
arXiv Detail & Related papers (2025-04-25T02:40:48Z) - Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach [3.827294988616478]
Anomaly detection is crucial in financial auditing and effective detection often requires obtaining large volumes of data from multiple organizations.<n>In this study, we propose a novel framework employing Data Collaboration (DC) analysis to streamline model training into a single communication round.<n>Our findings represent a significant advance in artificial intelligence-driven auditing and underscore the potential of FL methods in high-security domains.
arXiv Detail & Related papers (2025-01-22T08:53:12Z) - Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation [19.543102037001134]
Language models (LMs) are known to suffer from hallucinations and misinformation.
Retrieval augmented generation (RAG) that retrieves verifiable information from an external knowledge corpus provides a tangible solution to these problems.
RAG generation quality is highly dependent on the relevance between a user's query and the retrieved documents.
arXiv Detail & Related papers (2024-10-10T19:14:55Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.<n>We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - A Framework for Verifiable and Auditable Federated Anomaly Detection [3.639790324866155]
Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks.
We present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection.
arXiv Detail & Related papers (2022-03-15T11:34:02Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.