Explainable Natural Language Processing with Matrix Product States
- URL: http://arxiv.org/abs/2112.08628v1
- Date: Thu, 16 Dec 2021 05:10:32 GMT
- Title: Explainable Natural Language Processing with Matrix Product States
- Authors: Jirawat Tangpanitanon, Chanatip Mangkang, Pradeep Bhadola, Yuichiro
Minato, Dimitris Angelakis, Thiparat Chotibut
- Abstract summary: We perform a systematic analysis of RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews.
We show that single-layer RACs possess a maximum information propagation capacity.
Our work sheds light on the phenomenology of learning in RACs and more generally on the explainability aspects of RNNs for NLP.
- Score: 2.3243389656894595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite empirical successes of recurrent neural networks (RNNs) in natural
language processing (NLP), theoretical understanding of RNNs is still limited
due to intrinsically complex computations in RNNs. We perform a systematic
analysis of RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of
movie reviews, via the mapping between a class of RNNs called recurrent
arithmetic circuits (RACs) and a matrix product state (MPS). Using the
von-Neumann entanglement entropy (EE) as a proxy for information propagation,
we show that single-layer RACs possess a maximum information propagation
capacity, reflected by the saturation of the EE. Enlarging the bond dimension
of an MPS beyond the EE saturation threshold does not increase the prediction
accuracies, so a minimal model that best estimates the data statistics can be
constructed. Although the saturated EE is smaller than the maximum EE
achievable by the area law of an MPS, our model achieves ~99% training
accuracies in realistic sentiment analysis data sets. Thus, low EE alone is not
a warrant against the adoption of single-layer RACs for NLP. Contrary to a
common belief that long-range information propagation is the main source of
RNNs' expressiveness, we show that single-layer RACs also harness high
expressiveness from meaningful word vector embeddings. Our work sheds light on
the phenomenology of learning in RACs and more generally on the explainability
aspects of RNNs for NLP, using tools from many-body quantum physics.
Related papers
- Explicit Context Integrated Recurrent Neural Network for Sensor Data
Applications [0.0]
Context Integrated RNN (CiRNN) enables integrating explicit contexts represented in the form of contextual features.
Experiments show an improvement of 39% and 87% respectively, over state-of-the-art models.
arXiv Detail & Related papers (2023-01-12T13:58:56Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Signal Processing for Implicit Neural Representations [80.38097216996164]
Implicit Neural Representations (INRs) encode continuous multi-media data via multi-layer perceptrons.
Existing works manipulate such continuous representations via processing on their discretized instance.
We propose an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR.
arXiv Detail & Related papers (2022-10-17T06:29:07Z) - CARE: Certifiably Robust Learning with Reasoning via Variational
Inference [26.210129662748862]
We propose a certifiably robust learning with reasoning pipeline (CARE)
CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines.
We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.
arXiv Detail & Related papers (2022-09-12T07:15:52Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Relational Weight Priors in Neural Networks for Abstract Pattern
Learning and Language Modelling [6.980076213134383]
Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data.
It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically.
We propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns.
arXiv Detail & Related papers (2021-03-10T17:21:16Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Distance and Equivalence between Finite State Machines and Recurrent
Neural Networks: Computational results [0.348097307252416]
We show some results related to the problem of extracting Finite State Machine based models from trained RNN Language models.
Our reduction technique from 3-SAT makes this latter fact easily generalizable to other RNN architectures.
arXiv Detail & Related papers (2020-04-01T14:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.