Related papers: A Survey of Deep Learning: From Activations to Transformers

A Survey of Deep Learning: From Activations to Transformers

URL: http://arxiv.org/abs/2302.00722v3
Date: Sat, 10 Feb 2024 17:48:25 GMT
Title: A Survey of Deep Learning: From Activations to Transformers
Authors: Johannes Schneider and Michalis Vlachos
Abstract summary: We provide a comprehensive overview of the most important, recent works in deep learning. We identify and discuss patterns that summarize the key strategies for many of the successful innovations over the last decade. We also include a discussion on recent commercially built, closed-source models such as OpenAI's GPT-4 and Google's PaLM 2.
Score: 3.175481425273993
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning has made tremendous progress in the last decade. A key success factor is the large amount of architectures, layers, objectives, and optimization techniques. They include a myriad of variants related to attention, normalization, skip connections, transformers and self-supervised learning schemes -- to name a few. We provide a comprehensive overview of the most important, recent works in these areas to those who already have a basic understanding of deep learning. We hope that a holistic and unified treatment of influential, recent works helps researchers to form new connections between diverse areas of deep learning. We identify and discuss multiple patterns that summarize the key strategies for many of the successful innovations over the last decade as well as works that can be seen as rising stars. We also include a discussion on recent commercially built, closed-source models such as OpenAI's GPT-4 and Google's PaLM 2.

Related papers

A Review of DeepSeek Models' Key Innovative Techniques [10.977907906989342]
DeepSeek-V3 and DeepSeek-R1 are leading open-source Large Language Models. We review the core techniques driving the remarkable effectiveness and efficiency of these models.
arXiv Detail & Related papers (2025-03-14T15:11:29Z)
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [77.94874338927492]
OpenAI has claimed that the main techinique behinds o1 is the reinforcement learning. This paper analyzes the roadmap to achieving o1 from the perspective of reinforcement learning.
arXiv Detail & Related papers (2024-12-18T18:24:47Z)
O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey. Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects. We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z)
Towards a Unified View of Preference Learning for Large Language Models: A Survey [88.66719962576005]
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. We decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm.
arXiv Detail & Related papers (2024-09-04T15:11:55Z)
What comes after transformers? -- A selective survey connecting ideas in deep learning [1.8592384822257952]
Transformers have become the de-facto standard model in artificial intelligence since 2017. For researchers it is difficult to keep track of such developments on a broader level. We provide a comprehensive overview of the many important, recent works in these areas to those who already have a basic understanding of deep learning.
arXiv Detail & Related papers (2024-08-01T08:50:25Z)
A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning. Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability. VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z)
Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z)
A Review on Methods and Applications in Multimodal Deep Learning [8.152125331009389]
Multimodal deep learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth.
arXiv Detail & Related papers (2022-02-18T13:50:44Z)
Deep Learning for Face Anti-Spoofing: A Survey [74.42603610773931]
Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs)
arXiv Detail & Related papers (2021-06-28T19:12:00Z)
A Deep Learning Framework for Lifelong Machine Learning [6.662800021628275]
We propose a simple yet powerful unified deep learning framework. Our framework supports almost all of these properties and approaches through one central mechanism. We hope that this unified lifelong learning framework inspires new work towards large-scale experiments and understanding human learning in general.
arXiv Detail & Related papers (2021-05-01T03:43:25Z)
Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances. In this paper, we tackle this varying depth problem using a steerable architecture. We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)
Meta-Learning in Neural Networks: A Survey [4.588028371034406]
This survey describes the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods.
arXiv Detail & Related papers (2020-04-11T16:34:24Z)
A Survey of Deep Learning for Scientific Discovery [13.372738220280317]
We have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. The amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. This suggests many exciting opportunities for deep learning applications in scientific settings.
arXiv Detail & Related papers (2020-03-26T06:16:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.