Related papers: Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections

Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections

URL: http://arxiv.org/abs/2601.00814v1
Date: Mon, 22 Dec 2025 11:02:30 GMT
Title: Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections
Authors: Abhishek Kumar,
Abstract summary: Cross-lingual alignment system using embedding based cosine similarity matching.<n>We use a fine-tuned transformer based multilingual model for generating better embeddings.<n>We have evaluated our work on OAEI-2022 multifarm track.
Score: 8.709638469928448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paper presents our work on cross-lingual ontology alignment system which uses embedding based cosine similarity matching. The ontology entities are made contextually richer by creating descriptions using novel techniques. We use a fine-tuned transformer based multilingual model for generating better embeddings. We use cosine similarity to find positive ontology entities pairs and then apply threshold filtering to retain only highly similar entities. We have evaluated our work on OAEI-2022 multifarm track. We achieve 71% F1 score (78% recall and 65% precision) on the evaluation dataset, 16% increase from best baseline score. This suggests that our proposed alignment pipeline is able to capture the subtle cross-lingual similarities.

Related papers

How Transliterations Improve Crosslingual Alignment [48.929677368744606]
Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives can improve crosslingual alignment.<n>This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance.
arXiv Detail & Related papers (2024-09-25T20:05:45Z)
Unsupervised Deep Cross-Language Entity Alignment [14.904785474912018]
We propose a simple and novel unsupervised method for cross-language entity alignment. We utilize the deep learning multi-language encoder combined with a machine translator to encode knowledge graph text. Compared with the state-of-the-art supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En alignment tasks.
arXiv Detail & Related papers (2023-09-19T13:12:48Z)
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z)
LyricSIM: A novel Dataset and Benchmark for Similarity Detection in Spanish Song LyricS [52.77024349608834]
We present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators.
arXiv Detail & Related papers (2023-06-02T07:48:20Z)
Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities. Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks. We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)
SemEval-2022 Task 8: Multi-lingual News Article Similarity [0.0]
This work is about finding the similarity between a pair of news articles. There are seven different objective similarity metrics provided in the dataset for each pair and the news articles are in multiple different languages. On top of the pre-trained embedding model, we calculated cosine similarity for baseline results and feed-forward neural network was then trained on top of it to improve the results.
arXiv Detail & Related papers (2022-08-20T16:06:53Z)
Hierarchical Similarity Learning for Language-based Product Image Retrieval [40.83290730640458]
This paper focuses on the cross-modal similarity measurement, and proposes a novel Hierarchical Similarity Learning network. Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-18T14:23:16Z)
Do Explicit Alignments Robustly Improve Multilingual Encoders? [22.954688396858085]
multilingual encoders can effectively learn cross-lingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. We propose a new contrastive alignment objective that can better utilize such signal.
arXiv Detail & Related papers (2020-10-06T07:43:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.