Unsupervised Deep Cross-Language Entity Alignment
- URL: http://arxiv.org/abs/2309.10598v1
- Date: Tue, 19 Sep 2023 13:12:48 GMT
- Title: Unsupervised Deep Cross-Language Entity Alignment
- Authors: Chuanyu Jiang, Yiming Qian, Lijun Chen, Yang Gu, and Xia Xie
- Abstract summary: We propose a simple and novel unsupervised method for cross-language entity alignment.
We utilize the deep learning multi-language encoder combined with a machine translator to encode knowledge graph text.
Compared with the state-of-the-art supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En alignment tasks.
- Score: 14.904785474912018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual entity alignment is the task of finding the same semantic
entities from different language knowledge graphs. In this paper, we propose a
simple and novel unsupervised method for cross-language entity alignment. We
utilize the deep learning multi-language encoder combined with a machine
translator to encode knowledge graph text, which reduces the reliance on label
data. Unlike traditional methods that only emphasize global or local alignment,
our method simultaneously considers both alignment strategies. We first view
the alignment task as a bipartite matching problem and then adopt the
re-exchanging idea to accomplish alignment. Compared with the traditional
bipartite matching algorithm that only gives one optimal solution, our
algorithm generates ranked matching results which enabled many potentials
downstream tasks. Additionally, our method can adapt two different types of
optimization (minimal and maximal) in the bipartite matching process, which
provides more flexibility. Our evaluation shows, we each scored 0.966, 0.990,
and 0.996 Hits@1 rates on the DBP15K dataset in Chinese, Japanese, and French
to English alignment tasks. We outperformed the state-of-the-art method in
unsupervised and semi-supervised categories. Compared with the state-of-the-art
supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En
alignment tasks while marginally lower by 0.2% in the Zh-En alignment task.
Related papers
- Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning [81.43257201833154]
We propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities.<n>Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirectional prediction of masked image and text.<n>The proposed method achieves new state-of-the-art results on all multilingual TIPR datasets.
arXiv Detail & Related papers (2025-10-20T16:01:11Z) - How Transliterations Improve Crosslingual Alignment [48.929677368744606]
Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives can improve crosslingual alignment.
This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance.
arXiv Detail & Related papers (2024-09-25T20:05:45Z) - BinaryAlign: Word Alignment as Binary Sequence Labeling [2.5575527199248347]
We propose BinaryAlign, a novel word alignment technique based on binary sequence labeling.
We explore the performance of BinaryAlign on non-English language pairs.
arXiv Detail & Related papers (2024-07-16T15:11:06Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Graph Algorithms for Multiparallel Word Alignment [2.5200727733264663]
In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph.
We present two graph algorithms for edge prediction: one inspired by recommender systems and one based on network link prediction.
arXiv Detail & Related papers (2021-09-13T19:40:29Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Do Explicit Alignments Robustly Improve Multilingual Encoders? [22.954688396858085]
multilingual encoders can effectively learn cross-lingual representation.
Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations.
We propose a new contrastive alignment objective that can better utilize such signal.
arXiv Detail & Related papers (2020-10-06T07:43:17Z) - Cross-lingual Alignment Methods for Multilingual BERT: A Comparative
Study [2.101267270902429]
We analyse how different forms of cross-lingual supervision and various alignment methods influence the transfer capability of mBERT in zero-shot setting.
We find that supervision from parallel corpus is generally superior to dictionary alignments.
arXiv Detail & Related papers (2020-09-29T20:56:57Z) - Cross-lingual Entity Alignment with Incidental Supervision [76.66793175159192]
We propose an incidentally supervised model, JEANS, which jointly represents multilingual KGs and text corpora in a shared embedding scheme.
Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision.
arXiv Detail & Related papers (2020-05-01T01:53:56Z) - Massively Multilingual Document Alignment with Cross-lingual
Sentence-Mover's Distance [8.395430195053061]
Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other.
We develop an unsupervised scoring function that leverages cross-lingual sentence embeddings to compute the semantic distance between documents in different languages.
These semantic distances are then used to guide a document alignment algorithm to properly pair cross-lingual web documents across a variety of low, mid, and high-resource language pairs.
arXiv Detail & Related papers (2020-01-31T05:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.