Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
- URL: http://arxiv.org/abs/2404.17625v2
- Date: Thu, 4 Jul 2024 14:52:11 GMT
- Title: Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land
- Authors: Simone Scardapane,
- Abstract summary: Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more.
This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland.
- Score: 5.540111184767844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. Stripped of anything else, neural networks are compositions of differentiable primitives, and studying them means learning how to program and how to interact with these models, a particular example of what is called differentiable programming. This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland. I overview the basics of optimizing a function via automatic differentiation, and a selection of the most common designs for handling sequences, graphs, texts, and audios. The focus is on a intuitive, self-contained introduction to the most important design techniques, including convolutional, attentional, and recurrent blocks, hoping to bridge the gap between theory and code (PyTorch and JAX) and leaving the reader capable of understanding some of the most advanced models out there, such as large language models (LLMs) and multimodal architectures.
Related papers
- Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment [6.614005142754584]
Universal Sparse Autoencoders (USAEs) are a framework for uncovering and aligning interpretable concepts spanning multiple deep neural networks.
USAEs learn a universal concept space that can reconstruct and interpret the internal activations of multiple models at once.
arXiv Detail & Related papers (2025-02-06T02:06:16Z) - Engineering A Large Language Model From Scratch [0.0]
Atinuke is a Transformer-based neural network that optimises performance across various language tasks.
It can emulate human-like language by extracting features and learning complex mappings.
System achieves state-of-the-art results on natural language tasks whilst remaining interpretable and robust.
arXiv Detail & Related papers (2024-01-30T04:29:48Z) - TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild [102.93338424976959]
We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved instruction-following capabilities.
Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model.
To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models.
arXiv Detail & Related papers (2023-09-14T15:34:01Z) - MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks [59.09343552273045]
We propose a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.
We demonstrate that joint learning of these diverse objectives is simple, effective, and maximizes the weight-sharing of the model across these tasks.
Our model achieves the state of the art on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models.
arXiv Detail & Related papers (2023-03-29T16:42:30Z) - PaLM-E: An Embodied Multimodal Language Model [101.29116156731762]
We propose embodied language models to incorporate real-world continuous sensor modalities into language models.
We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks.
Our largest model, PaLM-E-562B with 562B parameters, is a visual-language generalist with state-of-the-art performance on OK-VQA.
arXiv Detail & Related papers (2023-03-06T18:58:06Z) - Transformadores: Fundamentos teoricos y Aplicaciones [0.40611352512781856]
Transformers are a neural network architecture originally designed for natural language processing.
Its distinctive feature is its self-attention system, based on attention to one's own sequence.
This article is in Spanish to bring this scientific knowledge to the Spanish-speaking community.
arXiv Detail & Related papers (2023-02-18T13:30:32Z) - Join-Chain Network: A Logical Reasoning View of the Multi-head Attention
in Transformer [59.73454783958702]
We propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions.
In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET.
We find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space.
arXiv Detail & Related papers (2022-10-06T07:39:58Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - A Differentiable Recipe for Learning Visual Non-Prehensile Planar
Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation.
We propose a novel architecture that combines video decoding neural models with priors from contact mechanics.
We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z) - Automated Source Code Generation and Auto-completion Using Deep
Learning: Comparing and Discussing Current Language-Model-Related Approaches [0.0]
This paper compares different deep learning architectures to create and use language models based on programming code.
We discuss each approach's different strengths and weaknesses and what gaps we find to evaluate the language models or apply them in a real programming context.
arXiv Detail & Related papers (2020-09-16T15:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.