Related papers: End to End Binarized Neural Networks for Text Classification

End to End Binarized Neural Networks for Text Classification

URL: http://arxiv.org/abs/2010.05223v1
Date: Sun, 11 Oct 2020 11:21:53 GMT
Title: End to End Binarized Neural Networks for Text Classification
Authors: Harshil Jain, Akshat Agarwal, Kumar Shridhar, Denis Kleyko
Abstract summary: We propose an end to end binarized neural network architecture for the intent classification task. The proposed architecture achieves comparable to the state-of-the-art results on standard intent classification datasets.
Score: 4.046236197219608
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks have demonstrated their superior performance in almost every Natural Language Processing task, however, their increasing complexity raises concerns. In particular, these networks require high expenses on computational hardware, and training budget is a concern for many. Even for a trained network, the inference phase can be too demanding for resource-constrained devices, thus limiting its applicability. The state-of-the-art transformer models are a vivid example. Simplifying the computations performed by a network is one way of relaxing the complexity requirements. In this paper, we propose an end to end binarized neural network architecture for the intent classification task. In order to fully utilize the potential of end to end binarization, both input representations (vector embeddings of tokens statistics) and the classifier are binarized. We demonstrate the efficiency of such architecture on the intent classification of short texts over three datasets and for text classification with a larger dataset. The proposed architecture achieves comparable to the state-of-the-art results on standard intent classification datasets while utilizing ~ 20-40% lesser memory and training time. Furthermore, the individual components of the architecture, such as binarized vector embeddings of documents or binarized classifiers, can be used separately with not necessarily fully binary architectures.

Related papers

Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations. We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z)
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z)
Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics [1.0154623955833253]
We demonstrate three types of neural network architectures for efficient learning of deformations of solid bodies. The first two are based on the recently proposed CNN U-NET and MAgNET frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks.
arXiv Detail & Related papers (2022-12-01T13:10:56Z)
Investigating Neural Architectures by Synthetic Dataset Design [14.317837518705302]
Recent years have seen the emergence of many new neural network structures (architectures and layers) We sketch a methodology to measure the effect of each structure on a network's ability, by designing ad hoc synthetic datasets. We illustrate our methodology by building three datasets to evaluate each of the three following network properties.
arXiv Detail & Related papers (2022-04-23T10:50:52Z)
An Efficient End-to-End 3D Model Reconstruction based on Neural Architecture Search [5.913946292597174]
We propose an efficient model reconstruction method utilizing neural architecture search (NAS) and binary classification. Our method achieves significantly higher reconstruction accuracy using fewer network parameters.
arXiv Detail & Related papers (2022-02-27T08:53:43Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning [0.0]
We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture.
arXiv Detail & Related papers (2021-07-02T13:43:53Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs) Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.