Exploring Dimensionality Reduction Techniques in Multilingual
Transformers
- URL: http://arxiv.org/abs/2204.08415v1
- Date: Mon, 18 Apr 2022 17:20:55 GMT
- Title: Exploring Dimensionality Reduction Techniques in Multilingual
Transformers
- Authors: \'Alvaro Huertas-Garc\'ia, Alejandro Mart\'in, Javier Huertas-Tato,
David Camacho
- Abstract summary: This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
- Score: 64.78260098263489
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Both in scientific literature and in industry,, Semantic and context-aware
Natural Language Processing-based solutions have been gaining importance in
recent years. The possibilities and performance shown by these models when
dealing with complex Language Understanding tasks is unquestionable, from
conversational agents to the fight against disinformation in social networks.
In addition, considerable attention is also being paid to developing
multilingual models to tackle the language bottleneck. The growing need to
provide more complex models implementing all these features has been
accompanied by an increase in their size, without being conservative in the
number of dimensions required. This paper aims to give a comprehensive account
of the impact of a wide variety of dimensional reduction techniques on the
performance of different state-of-the-art multilingual Siamese Transformers,
including unsupervised dimensional reduction techniques such as linear and
nonlinear feature extraction, feature selection, and manifold techniques. In
order to evaluate the effects of these techniques, we considered the
multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb)
and two different baseline approaches, one using the pre-trained version of
several models and another using their fine-tuned STS version. The results
evidence that it is possible to achieve an average reduction in the number of
dimensions of $91.58\% \pm 2.59\%$ and $54.65\% \pm 32.20\%$, respectively.
This work has also considered the consequences of dimensionality reduction for
visualization purposes. The results of this study will significantly contribute
to the understanding of how different tuning approaches affect performance on
semantic-aware tasks and how dimensional reduction techniques deal with the
high-dimensional embeddings computed for the STS task and their potential for
highly demanding NLP tasks
Related papers
- Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance [0.0]
We propose that much of the benefit from pre-training may be captured by geometric characteristics of the latent space representations.
We find that there is a strong linear relationship between a measure of quantized cell density and average GLUE performance.
arXiv Detail & Related papers (2024-06-18T00:17:30Z) - ESE: Espresso Sentence Embeddings [11.682642816354418]
High-quality sentence embeddings are fundamental in many natural language processing (NLP) tasks.
We propose a novel sentence embedding model $mathrmEspresso$ $mathrmSentence$ $mathrmEmbeddings$ (ESE) with two learning processes.
arXiv Detail & Related papers (2024-02-22T18:35:05Z) - Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification [4.498100922387482]
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.
Previous results demonstrated that these methods can even improve performance on some classification tasks.
This paper investigates how these techniques influence the classification performance and computation costs compared to full fine-tuning.
arXiv Detail & Related papers (2023-08-14T17:12:43Z) - Probing Out-of-Distribution Robustness of Language Models with
Parameter-Efficient Transfer Learning [17.110208720745064]
In this study, we explore how the ability to detect out-of-distribution changes as the size of the PLM grows or the transfer methods are altered.
We evaluate various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks.
arXiv Detail & Related papers (2023-01-27T11:27:40Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Gradient Vaccine: Investigating and Improving Multi-task Optimization in
Massively Multilingual Models [63.92643612630657]
This paper attempts to peek into the black-box of multilingual optimization through the lens of loss function geometry.
We find that gradient similarity measured along the optimization trajectory is an important signal, which correlates well with language proximity.
We derive a simple and scalable optimization procedure, named Gradient Vaccine, which encourages more geometrically aligned parameter updates for close tasks.
arXiv Detail & Related papers (2020-10-12T17:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.