Related papers: Explainable Natural Language Processing with Matrix Product States

Explainable Natural Language Processing with Matrix Product States

URL: http://arxiv.org/abs/2112.08628v1
Date: Thu, 16 Dec 2021 05:10:32 GMT
Title: Explainable Natural Language Processing with Matrix Product States
Authors: Jirawat Tangpanitanon, Chanatip Mangkang, Pradeep Bhadola, Yuichiro Minato, Dimitris Angelakis, Thiparat Chotibut
Abstract summary: We perform a systematic analysis of RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews. We show that single-layer RACs possess a maximum information propagation capacity. Our work sheds light on the phenomenology of learning in RACs and more generally on the explainability aspects of RNNs for NLP.
Score: 2.3243389656894595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding of RNNs is still limited due to intrinsically complex computations in RNNs. We perform a systematic analysis of RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews, via the mapping between a class of RNNs called recurrent arithmetic circuits (RACs) and a matrix product state (MPS). Using the von-Neumann entanglement entropy (EE) as a proxy for information propagation, we show that single-layer RACs possess a maximum information propagation capacity, reflected by the saturation of the EE. Enlarging the bond dimension of an MPS beyond the EE saturation threshold does not increase the prediction accuracies, so a minimal model that best estimates the data statistics can be constructed. Although the saturated EE is smaller than the maximum EE achievable by the area law of an MPS, our model achieves ~99% training accuracies in realistic sentiment analysis data sets. Thus, low EE alone is not a warrant against the adoption of single-layer RACs for NLP. Contrary to a common belief that long-range information propagation is the main source of RNNs' expressiveness, we show that single-layer RACs also harness high expressiveness from meaningful word vector embeddings. Our work sheds light on the phenomenology of learning in RACs and more generally on the explainability aspects of RNNs for NLP, using tools from many-body quantum physics.

Related papers

Pruning Deep Neural Networks via a Combination of the Marchenko-Pastur Distribution and Regularization [0.18641315013048293]
Vision Transformers (ViTs) have emerged as a powerful class of models in the field of deep learning for image classification. We propose a novel Random Matrix Theory (RMT)-based method for pruning pre-trained DNNs, based on the sparsification of weights and singular vectors. We demonstrate that our RMT-based pruning can be used to reduce the number of parameters of ViT models by 30-50% with less than 1% loss in accuracy.
arXiv Detail & Related papers (2025-03-02T05:25:20Z)
Explicit Context Integrated Recurrent Neural Network for Sensor Data Applications [0.0]
Context Integrated RNN (CiRNN) enables integrating explicit contexts represented in the form of contextual features. Experiments show an improvement of 39% and 87% respectively, over state-of-the-art models.
arXiv Detail & Related papers (2023-01-12T13:58:56Z)
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP. We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Signal Processing for Implicit Neural Representations [80.38097216996164]
Implicit Neural Representations (INRs) encode continuous multi-media data via multi-layer perceptrons. Existing works manipulate such continuous representations via processing on their discretized instance. We propose an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR.
arXiv Detail & Related papers (2022-10-17T06:29:07Z)
CARE: Certifiably Robust Learning with Reasoning via Variational Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE) CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Relational Weight Priors in Neural Networks for Abstract Pattern Learning and Language Modelling [6.980076213134383]
Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data. It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically. We propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns.
arXiv Detail & Related papers (2021-03-10T17:21:16Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Distance and Equivalence between Finite State Machines and Recurrent Neural Networks: Computational results [0.348097307252416]
We show some results related to the problem of extracting Finite State Machine based models from trained RNN Language models. Our reduction technique from 3-SAT makes this latter fact easily generalizable to other RNN architectures.
arXiv Detail & Related papers (2020-04-01T14:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.