BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a
Combination of Multilingualism and Language Models
- URL: http://arxiv.org/abs/2008.08439v1
- Date: Wed, 19 Aug 2020 13:46:37 GMT
- Title: BabelEnconding at SemEval-2020 Task 3: Contextual Similarity as a
Combination of Multilingualism and Language Models
- Authors: Lucas R. C. Pessutto, Tiago de Melo, Viviane P. Moreira, Altigran da
Silva
- Abstract summary: This paper describes the system submitted by our team (BabelEnconding) to SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word Similarity.
- Score: 0.5276232626689568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the system submitted by our team (BabelEnconding) to
SemEval-2020 Task 3: Predicting the Graded Effect of Context in Word
Similarity. We propose an approach that relies on translation and multilingual
language models in order to compute the contextual similarity between pairs of
words. Our hypothesis is that evidence from additional languages can leverage
the correlation with the human generated scores. BabelEnconding was applied to
both subtasks and ranked among the top-3 in six out of eight task/language
combinations and was the highest scoring system three times.
Related papers
- Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Bridging Natural Language Processing and Psycholinguistics:
computationally grounded semantic similarity datasets for Basque and Spanish [0.0]
We present a word similarity dataset based on two well-known Natural Language Processing resources; text corpora and knowledge bases.
The present dataset includes noun pairs' information in Basque and European Spanish, but further work intends to extend it to more languages.
arXiv Detail & Related papers (2023-04-19T12:47:51Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - Cross-lingual Low Resource Speaker Adaptation Using Phonological
Features [2.8080708404213373]
We train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages.
With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature.
arXiv Detail & Related papers (2021-11-17T12:33:42Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - Translating Similar Languages: Role of Mutual Intelligibility in
Multilingual Transformers [8.9379057739817]
We investigate approaches to translate between similar languages under low resource conditions.
We submit Transformer-based bilingual and multilingual systems for all language pairs.
Our Spanish-Catalan model has the best performance of all the five language pairs.
arXiv Detail & Related papers (2020-11-10T10:58:38Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP
Deep Learning Architectures on Commonsense Reasoning Task [3.058685580689605]
We describe our attempt at SemEval-2020 Task 4 competition: Commonsense Validation and Explanation (ComVE) challenge.
Our system uses prepared labeled textual datasets that were manually curated for three different natural language inference subtasks.
For the second subtask, which is to select the reason why a statement does not make sense, we stand within the first six teams (93.7%) among 27 participants with very competitive results.
arXiv Detail & Related papers (2020-05-17T13:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.