Demystifying BERT: Implications for Accelerator Design
- URL: http://arxiv.org/abs/2104.08335v1
- Date: Wed, 14 Apr 2021 01:06:49 GMT
- Title: Demystifying BERT: Implications for Accelerator Design
- Authors: Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair
- Abstract summary: We focus on BERT, one of the most popular NLP transfer learning algorithms, to identify how its algorithmic behavior can guide future accelerator design.
We characterize compute-intensive BERT computations and discuss software and possible hardware mechanisms to further optimize these computations.
Overall, our analysis identifies holistic solutions to optimize systems for BERT-like models.
- Score: 4.80595971865854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning in natural language processing (NLP), as realized using
models like BERT (Bi-directional Encoder Representation from Transformer), has
significantly improved language representation with models that can tackle
challenging language problems. Consequently, these applications are driving the
requirements of future systems. Thus, we focus on BERT, one of the most popular
NLP transfer learning algorithms, to identify how its algorithmic behavior can
guide future accelerator design. To this end, we carefully profile BERT
training and identify key algorithmic behaviors which are worthy of attention
in accelerator design.
We observe that while computations which manifest as matrix multiplication
dominate BERT's overall runtime, as in many convolutional neural networks,
memory-intensive computations also feature prominently. We characterize these
computations, which have received little attention so far. Further, we also
identify heterogeneity in compute-intensive BERT computations and discuss
software and possible hardware mechanisms to further optimize these
computations. Finally, we discuss implications of these behaviors as networks
get larger and use distributed training environments, and how techniques such
as micro-batching and mixed-precision training scale. Overall, our analysis
identifies holistic solutions to optimize systems for BERT-like models.
Related papers
- Dynamic Range Reduction via Branch-and-Bound [1.533133219129073]
Key strategy to enhance hardware accelerators is the reduction of precision in arithmetic operations.
This paper introduces a fully principled Branch-and-Bound algorithm for reducing precision needs in QUBO problems.
Experiments validate our algorithm's effectiveness on an actual quantum annealer.
arXiv Detail & Related papers (2024-09-17T03:07:56Z) - BOLD: Boolean Logic Deep Learning [1.4272256806865107]
We introduce the notion of Boolean variation such that neurons made of Boolean weights and inputs can be trained efficiently in Boolean domain using Boolean logic instead of descent gradient and real arithmetic.
Our approach achieves baseline full-precision accuracy in ImageNet classification and surpasses state-of-the-art results in semantic segmentation.
It significantly reduces energy consumption during both training and inference.
arXiv Detail & Related papers (2024-05-25T19:50:23Z) - Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.
PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z) - Reinforced In-Context Black-Box Optimization [64.25546325063272]
RIBBO is a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion.
RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks.
Central to our method is to augment the optimization histories with textitregret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories.
arXiv Detail & Related papers (2024-02-27T11:32:14Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Bioinspired Cortex-based Fast Codebook Generation [0.09449650062296822]
We introduce a feature extraction method inspired by sensory cortical networks in the brain.
Dubbed as bioinspired cortex, the algorithm provides convergence to features from streaming signals with superior computational efficiency.
We show herein the superior performance of the cortex model in clustering and vector quantization.
arXiv Detail & Related papers (2022-01-28T18:37:43Z) - A text autoencoder from transformer for fast encoding language
representation [0.0]
We propose a deep bidirectional language model by using window masking mechanism at attention layer.
This work computes contextual language representations without random masking as does in BERT.
Our method shows O(n) complexity less compared to other transformer-based models with O($n2$)
arXiv Detail & Related papers (2021-11-04T13:09:10Z) - Towards Structured Dynamic Sparse Pre-Training of BERT [4.567122178196833]
We develop and study a straightforward, dynamic always-sparse pre-training approach for BERT language modeling task.
We demonstrate that training remains FLOP-efficient when using coarse-grained block sparsity, making it particularly promising for efficient execution on modern hardware accelerators.
arXiv Detail & Related papers (2021-08-13T14:54:26Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.