Detecting Unseen Multiword Expressions in American Sign Language
- URL: http://arxiv.org/abs/2310.00207v1
- Date: Sat, 30 Sep 2023 00:54:59 GMT
- Title: Detecting Unseen Multiword Expressions in American Sign Language
- Authors: Lee Kezar, Aryan Shukla
- Abstract summary: We tested two systems that apply word embeddings from GloVe to predict whether or not those lexemes compose a multiword expression.
It became apparent that word embeddings carry data that can detect non-compositionality with decent accuracy.
- Score: 1.2691047660244332
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multiword expressions present unique challenges in many translation tasks. In
an attempt to ultimately apply a multiword expression detection system to the
translation of American Sign Language, we built and tested two systems that
apply word embeddings from GloVe to determine whether or not the word
embeddings of lexemes can be used to predict whether or not those lexemes
compose a multiword expression. It became apparent that word embeddings carry
data that can detect non-compositionality with decent accuracy.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms
Detection [23.576133853110324]
The same multi-word expressions may have different meanings in different sentences.
They can be divided into two categories, which are literal meaning and idiomatic meaning.
We use a pre-trained language model, which can provide a context-aware sentence embedding.
arXiv Detail & Related papers (2022-04-13T02:45:04Z) - Subword Mapping and Anchoring across Languages [1.9352552677009318]
Subword Mapping and Anchoring across Languages (SMALA) is a method to construct bilingual subword vocabularies.
SMALA extracts subword alignments using an unsupervised state-of-the-art mapping technique.
We show that joint subword vocabularies obtained with SMALA lead to higher BLEU scores on sentences that contain many false positives and false negatives.
arXiv Detail & Related papers (2021-09-09T20:46:27Z) - A Simple and Efficient Probabilistic Language model for Code-Mixed Text [0.0]
We present a simple probabilistic approach for building efficient word embedding for code-mixed text.
We examine its efficacy for the classification task using bidirectional LSTMs and SVMs.
arXiv Detail & Related papers (2021-06-29T05:37:57Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Discovering Bilingual Lexicons in Polyglot Word Embeddings [32.53342453685406]
In this work, we utilize a single Skip-gram model trained on a multilingual corpus yielding polyglot word embeddings.
We present a novel finding that a surprisingly simple constrained nearest-neighbor sampling technique can retrieve bilingual lexicons.
Across three European language pairs, we observe that polyglot word embeddings indeed learn a rich semantic representation of words.
arXiv Detail & Related papers (2020-08-31T03:57:50Z) - MICE: Mining Idioms with Contextual Embeddings [0.0]
MICEatic expressions can be problematic for natural language processing applications.
We present an approach that uses contextual embeddings for that purpose.
We show that deep neural networks using both embeddings perform much better than existing approaches.
arXiv Detail & Related papers (2020-08-13T08:56:40Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.