Media of Langue: The dictionary that visualizes Inter-Lingual Semantic
Network/Space
- URL: http://arxiv.org/abs/2309.08609v3
- Date: Sat, 27 Jan 2024 09:08:08 GMT
- Title: Media of Langue: The dictionary that visualizes Inter-Lingual Semantic
Network/Space
- Authors: Goki Muramoto, Atsuki Sato, Takayoshi Koyama
- Abstract summary: "Media of Langue" is a novel dictionary visualizing Inter-lingual semantic network/space.
By visualizing this network/space for humans, an Inter-lingual dictionary can be realized that points to the semantic place of many words at once.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces "Media of Langue," a novel dictionary visualizing
Inter-lingual semantic network/space. Our proposed Inter-lingual semantic
network/space is formed solely from the accumulation of translation practices
between two or more language systems, in contrast to existing semantic
networks/spaces that explicitly use "intra"-lingual relations. By visualizing
this network/space for humans, an Inter-lingual dictionary can be realized that
points to the semantic place of many words at once with a chain of mutual
translation, which also contains the functions of existing dictionaries such as
bilingual and synonym dictionaries. We implemented and published this interface
as a web application, focusing on seven language pairs. In this paper, we first
describe Inter-lingual semantic network/space with its basic features and the
way to develop it from bilingual corpora, then details the design of "Media of
Langue," with a quick analysis and illustrative examples of use cases. Our
website is www.media-of-langue.org. A demonstration video is available at
https://youtu.be/98lXuX4yjsU.
Related papers
- Presence or Absence: Are Unknown Word Usages in Dictionaries? [6.185216877366987]
We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages.
We use a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries.
Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 testphase leaderboard.
arXiv Detail & Related papers (2024-06-02T07:57:45Z) - Exploring Alignment in Shared Cross-lingual Spaces [15.98134426166435]
We employ clustering to uncover latent concepts within multilingual models.
Our analysis focuses on quantifying the textitalignment and textitoverlap of these concepts across various languages.
Our study encompasses three multilingual models (textttmT5, texttmBERT, and textttXLM-R) and three downstream tasks (Machine Translation, Named Entity Recognition, and Sentiment Analysis)
arXiv Detail & Related papers (2024-05-23T13:20:24Z) - Beyond Shared Vocabulary: Increasing Representational Word Similarities
across Languages for Multilingual Machine Translation [9.794506112999823]
In this paper, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages.
Our experiments demonstrate the advantages of our approach: 1) embeddings of words with similar meanings are better aligned across languages, 2) our method achieves consistent BLEU improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less than 1.0% additional trainable parameters are required with a limited increase in computational costs.
arXiv Detail & Related papers (2023-05-23T16:11:00Z) - Discovering Language-neutral Sub-networks in Multilingual Language
Models [15.94622051535847]
Language neutrality of multilingual models is a function of the overlap between language-encoding sub-networks of these models.
Using mBERT as a foundation, we employ the lottery ticket hypothesis to discover sub-networks that are individually optimized for various languages and tasks.
We conclude that mBERT is comprised of a language-neutral sub-network shared among many languages, along with multiple ancillary language-specific sub-networks.
arXiv Detail & Related papers (2022-05-25T11:35:41Z) - Transferring Knowledge Distillation for Multilingual Social Event
Detection [42.663309895263666]
Recently published graph neural networks (GNNs) show promising performance at social event detection tasks.
We present a GNN that incorporates cross-lingual word embeddings for detecting events in multilingual data streams.
Experiments on both synthetic and real-world datasets show the framework to be highly effective at detection in both multilingual data and in languages where training samples are scarce.
arXiv Detail & Related papers (2021-08-06T12:38:42Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Visual Grounding in Video for Unsupervised Word Translation [91.47607488740647]
We use visual grounding to improve unsupervised word mapping between languages.
We learn embeddings from unpaired instructional videos narrated in the native language.
We apply these methods to translate words from English to French, Korean, and Japanese.
arXiv Detail & Related papers (2020-03-11T02:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.