SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference
- URL: http://arxiv.org/abs/2103.09635v2
- Date: Thu, 18 Mar 2021 19:28:55 GMT
- Title: SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference
- Authors: Javier Huertas-Tato and Alejandro Mart\'in and David Camacho
- Abstract summary: The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
- Score: 71.57324258813674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability of Transformers to perform with precision a variety of tasks such
as question answering, Natural Language Inference (NLI) or summarising, have
enable them to be ranked as one of the best paradigms to address this kind of
tasks at present. NLI is one of the best scenarios to test these architectures,
due to the knowledge required to understand complex sentences and established a
relation between a hypothesis and a premise. Nevertheless, these models suffer
from incapacity to generalise to other domains or difficulties to face
multilingual scenarios. The leading pathway in the literature to address these
issues involve designing and training extremely large architectures, which
leads to unpredictable behaviours and to establish barriers which impede broad
access and fine tuning. In this paper, we propose a new architecture, siamese
multilingual transformer (SML), to efficiently align multilingual embeddings
for Natural Language Inference. SML leverages siamese pre-trained multi-lingual
transformers with frozen weights where the two input sentences attend each
other to later be combined through a matrix alignment method. The experimental
results carried out in this paper evidence that SML allows to reduce
drastically the number of trainable parameters while still achieving
state-of-the-art performance.
Related papers
- LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model [50.339632513018934]
supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
arXiv Detail & Related papers (2024-04-25T17:19:36Z) - Evaluating Shortest Edit Script Methods for Contextual Lemmatization [6.0158981171030685]
Modern contextual lemmatizers often rely on automatically induced Shortest Edit Scripts (SES) to transform a word form into its lemma.
Previous work has not investigated the direct impact of SES in the final lemmatization performance.
We show that computing the casing and edit operations separately is beneficial overall, but much more clearly for languages with high-inflected morphology.
arXiv Detail & Related papers (2024-03-25T17:28:24Z) - Deep Natural Language Feature Learning for Interpretable Prediction [1.6114012813668932]
We propose a method to break down a main complex task into a set of intermediary easier sub-tasks.
Our method allows for representing each example by a vector consisting of the answers to these questions.
We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.
arXiv Detail & Related papers (2023-11-09T21:43:27Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural
Cross-Lingual Information Retrieval [15.902630454568811]
We propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table.
By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence.
arXiv Detail & Related papers (2021-09-07T00:33:14Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks.
We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection.
We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.