UAlberta at SemEval 2022 Task 2: Leveraging Glosses and Translations for
Multilingual Idiomaticity Detection
- URL: http://arxiv.org/abs/2205.14084v1
- Date: Fri, 27 May 2022 16:35:00 GMT
- Title: UAlberta at SemEval 2022 Task 2: Leveraging Glosses and Translations for
Multilingual Idiomaticity Detection
- Authors: Bradley Hauer, Seeratpal Jaura, Talgat Omarov, Grzegorz Kondrak
- Abstract summary: We describe the University of Alberta systems for the SemEval-2022 Task 2 on multilingual idiomaticity detection.
Under the assumption that idiomatic expressions are noncompositional, our first method integrates information on the meanings of the individual words of an expression into a binary classifier.
Our second method translates an expression in context, and uses a lexical knowledge base to determine if the translation is literal.
- Score: 4.66831886752751
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We describe the University of Alberta systems for the SemEval-2022 Task 2 on
multilingual idiomaticity detection. Working under the assumption that
idiomatic expressions are noncompositional, our first method integrates
information on the meanings of the individual words of an expression into a
binary classifier. Further hypothesizing that literal and idiomatic expressions
translate differently, our second method translates an expression in context,
and uses a lexical knowledge base to determine if the translation is literal.
Our approaches are grounded in linguistic phenomena, and leverage existing
sources of lexical knowledge. Our results offer support for both approaches,
particularly the former.
Related papers
- Crowdsourcing Lexical Diversity [7.569845058082537]
This paper proposes a novel crowdsourcing methodology for reducing bias in lexicons.
Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food.
We validated our method by applying it to two case studies focused on food-related terminology.
arXiv Detail & Related papers (2024-10-30T15:45:09Z) - Monolingual alignment of word senses and definitions in lexicographical
resources [0.0]
The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries.
The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries.
This benchmark can be used for evaluation purposes of word-sense alignment systems.
arXiv Detail & Related papers (2022-09-06T13:09:52Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms
Detection [23.576133853110324]
The same multi-word expressions may have different meanings in different sentences.
They can be divided into two categories, which are literal meaning and idiomatic meaning.
We use a pre-trained language model, which can provide a context-aware sentence embedding.
arXiv Detail & Related papers (2022-04-13T02:45:04Z) - Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of
Lexical Gaps in Kinship [4.970603969125883]
We capture the phenomenon of diversity through the notions of lexical gap and language-specific word.
We publish a lexico-semantic resource consisting of 198 domain concepts, 1,911 words, and 37,370 gaps covering 699 languages.
arXiv Detail & Related papers (2022-04-11T12:36:26Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.