Related papers: What comes after transformers? -- A selective survey connecting ideas in deep learning

What comes after transformers? -- A selective survey connecting ideas in deep learning

URL: http://arxiv.org/abs/2408.00386v1
Date: Thu, 1 Aug 2024 08:50:25 GMT
Title: What comes after transformers? -- A selective survey connecting ideas in deep learning
Authors: Johannes Schneider,
Abstract summary: Transformers have become the de-facto standard model in artificial intelligence since 2017. For researchers it is difficult to keep track of such developments on a broader level. We provide a comprehensive overview of the many important, recent works in these areas to those who already have a basic understanding of deep learning.
Score: 1.8592384822257952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have become the de-facto standard model in artificial intelligence since 2017 despite numerous shortcomings ranging from energy inefficiency to hallucinations. Research has made a lot of progress in improving elements of transformers, and, more generally, deep learning manifesting in many proposals for architectures, layers, optimization objectives, and optimization techniques. For researchers it is difficult to keep track of such developments on a broader level. We provide a comprehensive overview of the many important, recent works in these areas to those who already have a basic understanding of deep learning. Our focus differs from other works, as we target specifically novel, alternative potentially disruptive approaches to transformers as well as successful ideas of recent deep learning. We hope that such a holistic and unified treatment of influential, recent works and novel ideas helps researchers to form new connections between diverse areas of deep learning. We identify and discuss multiple patterns that summarize the key strategies for successful innovations over the last decade as well as works that can be seen as rising stars. Especially, we discuss attempts on how to improve on transformers covering (partially) proven methods such as state space models but also including far-out ideas in deep learning that seem promising despite not achieving state-of-the-art results. We also cover a discussion on recent state-of-the-art models such as OpenAI's GPT series and Meta's LLama models and, Google's Gemini model family.

Related papers

DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models [73.99173041896884]
DeepSeek has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages.<n>The paper highlights novel algorithms introduced by DeepSeek, including Multi-head Latent Attention (MLA), Mixture-of-Experts (MoE), Multi-Token Prediction (MTP), and Group Relative Policy Optimization (GRPO)
arXiv Detail & Related papers (2025-07-14T06:10:30Z)
A Decade of Deep Learning: A Survey on The Magnificent Seven [19.444198085817543]
Deep learning has fundamentally reshaped the landscape of artificial intelligence over the past decade. We present a comprehensive overview of the most influential deep learning algorithms selected through a broad-based survey of the field. Our discussion centers on pivotal architectures, including Residual Networks, Transformers, Generative Adversarial Networks, Variational Autoencoders, Graph Neural Networks, Contrastive Language-Image Pre-training, and Diffusion models.
arXiv Detail & Related papers (2024-12-13T17:55:39Z)
A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning. Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability. VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z)
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey [23.674123304219822]
Human action recognition is one of the most widely studied research problems in Computer Vision. Recent studies have shown that addressing it using multimodal data leads to superior performance. Recent rise of Transformers in visual modelling is now also causing a paradigm shift for the action recognition task.
arXiv Detail & Related papers (2024-05-22T02:11:18Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
A Survey of Deep Learning: From Activations to Transformers [3.175481425273993]
We provide a comprehensive overview of the most important, recent works in deep learning. We identify and discuss patterns that summarize the key strategies for many of the successful innovations over the last decade. We also include a discussion on recent commercially built, closed-source models such as OpenAI's GPT-4 and Google's PaLM 2.
arXiv Detail & Related papers (2023-02-01T19:34:55Z)
A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation. We propose a novel architecture that combines video decoding neural models with priors from contact mechanics. We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z)
Attention mechanisms and deep learning for machine vision: A survey of the state of the art [0.0]
Vision transformers (ViTs) are giving quite a challenge to the established deep learning based machine vision techniques. Some recent works suggest that combinations of these two varied fields can prove to build systems which have the advantages of both these fields.
arXiv Detail & Related papers (2021-06-03T10:23:32Z)
Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence. Transformers require minimal inductive biases for their design and are naturally suited as set-functions. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z)
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey [0.07366405857677225]
We cover the background behind sim-to-real transfer in deep reinforcement learning. We overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation.
arXiv Detail & Related papers (2020-09-24T21:05:46Z)
Efficient Transformers: A Survey [98.23264445730645]
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models.
arXiv Detail & Related papers (2020-09-14T20:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.