Visual Grounding of Inter-lingual Word-Embeddings
- URL: http://arxiv.org/abs/2209.03714v1
- Date: Thu, 8 Sep 2022 11:18:39 GMT
- Title: Visual Grounding of Inter-lingual Word-Embeddings
- Authors: Wafaa Mohammed, Hassan Shahmohammadi, Hendrik P. A. Lensch, R. Harald
Baayen
- Abstract summary: The present study investigates the inter-lingual visual grounding of word embeddings.
We focus on three languages in our experiments, namely, English, Arabic, and German.
Our experiments suggest that inter-lingual knowledge improves the performance of grounded embeddings in similar languages.
- Score: 6.136487946258519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual grounding of Language aims at enriching textual representations of
language with multiple sources of visual knowledge such as images and videos.
Although visual grounding is an area of intense research, inter-lingual aspects
of visual grounding have not received much attention. The present study
investigates the inter-lingual visual grounding of word embeddings. We propose
an implicit alignment technique between the two spaces of vision and language
in which inter-lingual textual information interacts in order to enrich
pre-trained textual word embeddings. We focus on three languages in our
experiments, namely, English, Arabic, and German. We obtained visually grounded
vector representations for these languages and studied whether visual grounding
on one or multiple languages improved the performance of embeddings on word
similarity and categorization benchmarks. Our experiments suggest that
inter-lingual knowledge improves the performance of grounded embeddings in
similar languages such as German and English. However, inter-lingual grounding
of German or English with Arabic led to a slight degradation in performance on
word similarity benchmarks. On the other hand, we observed an opposite trend on
categorization benchmarks where Arabic had the most improvement on English. In
the discussion section, several reasons for those findings are laid out. We
hope that our experiments provide a baseline for further research on
inter-lingual visual grounding.
Related papers
- Computer Vision Datasets and Models Exhibit Cultural and Linguistic
Diversity in Perception [28.716435050743957]
We study how people from different cultural backgrounds observe vastly different concepts even when viewing the same visual stimuli.
By comparing textual descriptions generated across 7 languages for the same images, we find significant differences in the semantic content and linguistic expression.
Our work points towards the need to accounttuning for and embrace the diversity of human perception in the computer vision community.
arXiv Detail & Related papers (2023-10-22T16:51:42Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Relationship of the language distance to English ability of a country [0.0]
We introduce a novel solution to measure the semantic dissimilarity between languages.
We empirically examine the effectiveness of the proposed semantic language distance.
The experimental results show that the language distance demonstrates negative influence on a country's average English ability.
arXiv Detail & Related papers (2022-11-15T02:40:00Z) - Like a bilingual baby: The advantage of visually grounding a bilingual
language model [0.0]
We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES.
We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity.
Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.
arXiv Detail & Related papers (2022-10-11T14:43:26Z) - Language with Vision: a Study on Grounded Word and Sentence Embeddings [6.231247903840833]
Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations.
The present study proposes a computational grounding model for pre-trained word embeddings.
Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information.
arXiv Detail & Related papers (2022-06-17T15:04:05Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Visual Grounding in Video for Unsupervised Word Translation [91.47607488740647]
We use visual grounding to improve unsupervised word mapping between languages.
We learn embeddings from unpaired instructional videos narrated in the native language.
We apply these methods to translate words from English to French, Korean, and Japanese.
arXiv Detail & Related papers (2020-03-11T02:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.