GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
- URL: http://arxiv.org/abs/2310.13068v1
- Date: Thu, 19 Oct 2023 18:08:22 GMT
- Title: GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings
- Authors: Muhammad Asif Ali, Maha Alshmrani, Jianbin Qin, Yan Hu, Di Wang
- Abstract summary: Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces.
Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words.
We propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network.
- Score: 10.054788741823627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on
the relative isomorphism of individual embedding spaces. Existing attempts
aimed at controlling the relative isomorphism of different embedding spaces
fail to incorporate the impact of semantically related words in the model
training objective. To address this, we propose GARI that combines the
distributional training objectives with multiple isomorphism losses guided by
the graph attention network. GARI considers the impact of semantical variations
of words in order to define the relative isomorphism of the embedding spaces.
Experimental evaluation using the Arabic language data set shows that GARI
outperforms the existing research by improving the average P@1 by a relative
score of up to 40.95% and 76.80% for in-domain and domain mismatch settings
respectively. We release the codes for GARI at
https://github.com/asif6827/GARI.
Related papers
- Homonym Sense Disambiguation in the Georgian Language [49.1574468325115]
This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language.
It is based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus.
arXiv Detail & Related papers (2024-04-24T21:48:43Z) - GRI: Graph-based Relative Isomorphism of Word Embedding Spaces [10.984134369344117]
Automated construction of bilingual dictionaries using monolingual embedding spaces is a core challenge in machine translation.
Existing attempts aimed at controlling the relative isomorphism of different spaces fail to incorporate the impact of semantically related words in the training objective.
We propose GRI that combines the distributional training objectives with attentive graph convolutions to unanimously consider the impact of semantically similar words.
arXiv Detail & Related papers (2023-10-18T22:10:47Z) - Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided
Relation Alignment and Adaptation [98.51938442785179]
Incremental few-shot semantic segmentation aims to incrementally extend a semantic segmentation model to novel classes.
This task faces a severe semantic-aliasing issue between base and novel classes due to data imbalance.
We propose the Semantic-guided Relation Alignment and Adaptation (SRAA) method that fully considers the guidance of prior semantic information.
arXiv Detail & Related papers (2023-05-18T10:40:52Z) - RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction [25.19579637815882]
lexicon induction induces the word translations by aligning independently trained word embeddings in two languages.
We propose a novel ranking-oriented induction model RAPO to learn personalized mapping function for each word.
RAPO is capable of enjoying the merits from the unique characteristics of a single word and the cross-language isomorphism simultaneously.
arXiv Detail & Related papers (2022-10-18T15:11:45Z) - IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces [24.256732557154486]
We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic.
We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces.
arXiv Detail & Related papers (2022-10-11T02:29:34Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - A Differentiable Relaxation of Graph Segmentation and Alignment for AMR
Parsing [75.36126971685034]
We treat alignment and segmentation as latent variables in our model and induce them as part of end-to-end training.
Our method also approaches that of a model that relies on citetLyu2018AMRPA's segmentation rules, which were hand-crafted to handle individual AMR constructions.
arXiv Detail & Related papers (2020-10-23T21:22:50Z) - Simultaneous Semantic Alignment Network for Heterogeneous Domain
Adaptation [67.37606333193357]
We propose aSimultaneous Semantic Alignment Network (SSAN) to simultaneously exploit correlations among categories and align the centroids for each category across domains.
By leveraging target pseudo-labels, a robust triplet-centroid alignment mechanism is explicitly applied to align feature representations for each category.
Experiments on various HDA tasks across text-to-image, image-to-image and text-to-text successfully validate the superiority of our SSAN against state-of-the-art HDA methods.
arXiv Detail & Related papers (2020-08-04T16:20:37Z) - The Secret is in the Spectra: Predicting Cross-lingual Task Performance
with Spectral Similarity Measures [83.53361353172261]
We present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance.
We introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra.
We empirically show that 1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks.
arXiv Detail & Related papers (2020-01-30T00:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.