Learning Randomized Algorithms with Transformers
- URL: http://arxiv.org/abs/2408.10818v1
- Date: Tue, 20 Aug 2024 13:13:36 GMT
- Title: Learning Randomized Algorithms with Transformers
- Authors: Johannes von Oswald, Seijin Kobayashi, Yassir Akram, Angelika Steger,
- Abstract summary: In this paper, we enhance deep neural networks, in particular transformer models, with randomization.
We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner.
- Score: 8.556706939126146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomization is a powerful tool that endows algorithms with remarkable properties. For instance, randomized algorithms excel in adversarial settings, often surpassing the worst-case performance of deterministic algorithms with large margins. Furthermore, their success probability can be amplified by simple strategies such as repetition and majority voting. In this paper, we enhance deep neural networks, in particular transformer models, with randomization. We demonstrate for the first time that randomized algorithms can be instilled in transformers through learning, in a purely data- and objective-driven manner. First, we analyze known adversarial objectives for which randomized algorithms offer a distinct advantage over deterministic ones. We then show that common optimization techniques, such as gradient descent or evolutionary strategies, can effectively learn transformer parameters that make use of the randomness provided to the model. To illustrate the broad applicability of randomization in empowering neural networks, we study three conceptual tasks: associative recall, graph coloring, and agents that explore grid worlds. In addition to demonstrating increased robustness against oblivious adversaries through learned randomization, our experiments reveal remarkable performance improvements due to the inherently random nature of the neural networks' computation and predictions.
Related papers
- Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks [4.643954670642798]
This paper investigates how various randomization techniques are used in Deep Neural Networks (DNNs)
It categorizes techniques into four types: adding noise to the loss function, masking random gradient updates, data augmentation and weight generalization.
The complete implementation and dataset are available on GitHub.
arXiv Detail & Related papers (2024-04-05T10:02:32Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - Randomized Adversarial Style Perturbations for Domain Generalization [49.888364462991234]
We propose a novel domain generalization technique, referred to as Randomized Adversarial Style Perturbation (RASP)
The proposed algorithm perturbs the style of a feature in an adversarial direction towards a randomly selected class, and makes the model learn against being misled by the unexpected styles observed in unseen target domains.
We evaluate the proposed algorithm via extensive experiments on various benchmarks and show that our approach improves domain generalization performance, especially in large-scale benchmarks.
arXiv Detail & Related papers (2023-04-04T17:07:06Z) - Quantifying Inherent Randomness in Machine Learning Algorithms [7.591218883378448]
This paper uses an empirical study to examine the effects of randomness in model training and randomness in the partitioning of a dataset into training and test subsets.
We quantify and compare the magnitude of the variation in predictive performance for the following ML algorithms: Random Forests (RFs), Gradient Boosting Machines (GBMs), and Feedforward Neural Networks (FFNNs)
arXiv Detail & Related papers (2022-06-24T15:49:52Z) - Robust Binary Models by Pruning Randomly-initialized Networks [57.03100916030444]
We propose ways to obtain robust models against adversarial attacks from randomly-d binary networks.
We learn the structure of the robust model by pruning a randomly-d binary network.
Our method confirms the strong lottery ticket hypothesis in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T00:05:08Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Neural Network Adversarial Attack Method Based on Improved Genetic
Algorithm [0.0]
We propose a neural network adversarial attack method based on an improved genetic algorithm.
The method does not need the internal structure and parameter information of the neural network model.
arXiv Detail & Related papers (2021-10-05T04:46:16Z) - Adaptive Random Quantum Eigensolver [0.0]
We introduce a general method to parametrize and optimize the probability density function of a random number generator.
Our optimization is based on two figures of merit: learning speed and learning accuracy.
arXiv Detail & Related papers (2021-06-28T12:01:05Z) - Neural Random Forest Imitation [24.02961053662835]
We introduce an imitation learning approach by generating training data from a random forest and learning a neural network that imitates its behavior.
This implicit transformation creates very efficient neural networks that learn the decision boundaries of a random forest.
Experiments on several real-world benchmark datasets demonstrate superior performance.
arXiv Detail & Related papers (2019-11-25T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.