Related papers: Transformadores: Fundamentos teoricos y Aplicaciones

Transformadores: Fundamentos teoricos y Aplicaciones

URL: http://arxiv.org/abs/2302.09327v1
Date: Sat, 18 Feb 2023 13:30:32 GMT
Title: Transformadores: Fundamentos teoricos y Aplicaciones
Authors: Jordi de la Torre
Abstract summary: Transformers are a neural network architecture originally designed for natural language processing. Its distinctive feature is its self-attention system, based on attention to one's own sequence. This article is in Spanish to bring this scientific knowledge to the Spanish-speaking community.
Score: 0.40611352512781856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers are a neural network architecture originally designed for natural language processing that it is now a mainstream tool for solving a wide variety of problems, including natural language processing, sound, image, reinforcement learning, and other problems with heterogeneous input data. Its distinctive feature is its self-attention system, based on attention to one's own sequence, which derives from the previously introduced attention system. This article provides the reader with the necessary context to understand the most recent research articles and presents the mathematical and algorithmic foundations of the elements that make up this type of network. The different components that make up this architecture and the variations that may exist are also studied, as well as some applications of the transformer models. This article is in Spanish to bring this scientific knowledge to the Spanish-speaking community.

Related papers

A Review on the Applications of Transformer-based language models for Nucleotide Sequence Analysis [0.8049701904919515]
This paper introduces the major developments of Transformer-based models in the recent past in the context of nucleotide sequences. We believe this review will help the scientific community in understanding the various applications of Transformer-based language models to nucleotide sequences.
arXiv Detail & Related papers (2024-12-10T05:33:09Z)
Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land [5.540111184767844]
Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland.
arXiv Detail & Related papers (2024-04-26T15:19:58Z)
A Survey on Large Language Models from Concept to Implementation [4.219910716090213]
Recent advancements in Large Language Models (LLMs) have broadened the scope of natural language processing (NLP) applications. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of artificial intelligence (AI) driven tools in revolutionizing traditional tasks like coding and problem-solving.
arXiv Detail & Related papers (2024-03-27T19:35:41Z)
Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z)
Engineering A Large Language Model From Scratch [0.0]
Atinuke is a Transformer-based neural network that optimises performance across various language tasks. It can emulate human-like language by extracting features and learning complex mappings. System achieves state-of-the-art results on natural language tasks whilst remaining interpretable and robust.
arXiv Detail & Related papers (2024-01-30T04:29:48Z)
Knowledge-Infused Self Attention Transformers [11.008412414253662]
Transformer-based language models have achieved impressive success in various natural language processing tasks. This paper introduces a systematic method for infusing knowledge into different components of a transformer-based model.
arXiv Detail & Related papers (2023-06-23T13:55:01Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z)
BERTuit: Understanding Spanish language in Twitter through a native transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z)
On the validity of pre-trained transformers for natural language processing in the software engineering domain [78.32146765053318]
We compare BERT transformer models trained with software engineering data with transformers based on general domain data. Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable.
arXiv Detail & Related papers (2021-09-10T08:46:31Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
Multi-channel Transformers for Multi-articulatory Sign Language Translation [59.38247587308604]
We tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself.
arXiv Detail & Related papers (2020-09-01T09:10:55Z)
Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation [2.2344764434954256]
This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer. In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system. We show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task.
arXiv Detail & Related papers (2020-04-17T03:40:13Z)
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)
Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.