Media of Langue: The Interface for Exploring Word Translation Network/Space
- URL: http://arxiv.org/abs/2309.08609v4
- Date: Sat, 26 Oct 2024 08:57:11 GMT
- Title: Media of Langue: The Interface for Exploring Word Translation Network/Space
- Authors: Goki Muramoto, Atsuki Sato, Takayoshi Koyama,
- Abstract summary: We discover the huge network formed by the chain of these mutual translations as Word Translation Network.
We propose Media of Langue, a novel interface for exploring this network.
- Score: 0.0
- License:
- Abstract: In the human activity of word translation, two languages face each other, mutually searching their own language system for the semantic place of words in the other language. We discover the huge network formed by the chain of these mutual translations as Word Translation Network, a network where words are nodes, and translation volume is represented as edges, and propose Media of Langue, a novel interface for exploring this network. Media of Langue points to the semantic configurations of many words in multiple languages at once, containing the information of existing dictionaries such as bilingual and synonym dictionaries. We have also implemented and published this interface as a web application, focusing on seven language pairs. This paper first defines the Word Translation Network and describes how to actually construct the network from bilingual corpora, followed by an analysis of the properties of the network. Next, we explain how to design a Media of Langue using the Word Translation Network, and finally, we analyze the features of the Media of Langue as a dictionary. Our website is https://www.media-of-langue.org .
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - Deep Emotions Across Languages: A Novel Approach for Sentiment
Propagation in Multilingual WordNets [4.532887563053358]
This paper introduces two new techniques for automatically propagating sentiment annotations from a partially annotated WordNet to its entirety and to a WordNet in a different language.
We evaluated the proposed MSSE+CLDNS method extensively using Princeton WordNet and Polish WordNet, which have many inter-lingual relations.
Our results show that the MSSE+CLDNS method outperforms existing propagation methods, indicating its effectiveness in enriching WordNets with emotional metadata across multiple languages.
arXiv Detail & Related papers (2023-12-07T21:44:14Z) - Automatically constructing Wordnet synsets [2.363388546004777]
We propose approaches to generate Wordnet synsets for languages both resource-rich and resource-poor.
Our algorithms translate synsets of existing Wordnets to a target language T, then apply a ranking method on the translation candidates to find best translations in T.
arXiv Detail & Related papers (2022-08-08T02:02:18Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Discovering Language-neutral Sub-networks in Multilingual Language
Models [15.94622051535847]
Language neutrality of multilingual models is a function of the overlap between language-encoding sub-networks of these models.
Using mBERT as a foundation, we employ the lottery ticket hypothesis to discover sub-networks that are individually optimized for various languages and tasks.
We conclude that mBERT is comprised of a language-neutral sub-network shared among many languages, along with multiple ancillary language-specific sub-networks.
arXiv Detail & Related papers (2022-05-25T11:35:41Z) - Building the Language Resource for a Cebuano-Filipino Neural Machine
Translation System [0.0]
We present the efforts made to build a parallel corpus for Cebuano and Filipino from two different domains: biblical texts and the web.
For the biblical resource, subword unit translation for verbs and copy-able approach for nouns were applied to correct inconsistencies in the translation.
For Wikipedia, commonly occurring topic segments were extracted from both the source and the target languages.
arXiv Detail & Related papers (2021-10-05T23:03:09Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - On the Effects of Using word2vec Representations in Neural Networks for
Dialogue Act Recognition [0.6767885381740952]
We propose a new deep neural network that explores recurrent models to capture word sequences within sentences.
We validate this model on three languages: English, French and Czech.
arXiv Detail & Related papers (2020-10-22T07:21:17Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.