How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization
on Natural Text
- URL: http://arxiv.org/abs/2010.00363v1
- Date: Thu, 1 Oct 2020 12:49:01 GMT
- Title: How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization
on Natural Text
- Authors: Chihiro Shibata, Kei Uchiumi, Daichi Mochihashi
- Abstract summary: We learn a language model where syntactic structures are implicitly given.
We show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values.
For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures.
We also show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
- Score: 2.881185491084005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long Short-Term Memory recurrent neural network (LSTM) is widely used and
known to capture informative long-term syntactic dependencies. However, how
such information are reflected in its internal vectors for natural text has not
yet been sufficiently investigated. We analyze them by learning a language
model where syntactic structures are implicitly given. We empirically show that
the context update vectors, i.e. outputs of internal gates, are approximately
quantized to binary or ternary values to help the language model to count the
depth of nesting accurately, as Suzgun et al. (2019) recently show for
synthetic Dyck languages. For some dimensions in the context vector, we show
that their activations are highly correlated with the depth of phrase
structures, such as VP and NP. Moreover, with an $L_1$ regularization, we also
found that it can accurately predict whether a word is inside a phrase
structure or not from a small number of components of the context vector. Even
for the case of learning from raw text, context vectors are shown to still
correlate well with the phrase structures. Finally, we show that natural
clusters of the functional words and the part of speeches that trigger phrases
are represented in a small but principal subspace of the context-update vector
of LSTM.
Related papers
- Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data.
We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders.
In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z) - Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations [24.211603400355756]
Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models.
We look at how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations.
We validate our findings on synthetic and small-scale real language datasets.
arXiv Detail & Related papers (2024-08-27T21:46:47Z) - Function Vectors in Large Language Models [45.267194267587435]
We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs)
Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV)
arXiv Detail & Related papers (2023-10-23T17:55:24Z) - Advancing Regular Language Reasoning in Linear Recurrent Neural Networks [56.11830645258106]
We study whether linear recurrent neural networks (LRNNs) can learn the hidden rules in training sequences.
We propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix.
Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks.
arXiv Detail & Related papers (2023-09-14T03:36:01Z) - Backpack Language Models [108.65930795825416]
We present Backpacks, a new neural architecture that marries strong modeling performance with an interface for interpretability and control.
We find that, after training, sense vectors specialize, each encoding a different aspect of a word.
We present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing.
arXiv Detail & Related papers (2023-05-26T09:26:23Z) - Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window.
We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors.
We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z) - Why do Nearest Neighbor Language Models Work? [93.71050438413121]
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context.
Retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore.
arXiv Detail & Related papers (2023-01-07T11:12:36Z) - Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised.
The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee"
We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - Assessing the Unitary RNN as an End-to-End Compositional Model of Syntax [0.0]
We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns.
arXiv Detail & Related papers (2022-08-11T09:30:49Z) - Context based Text-generation using LSTM networks [0.5330240017302621]
The proposed model is trained to generate text for a given set of input words along with a context vector.
The results are evaluated based on the semantic closeness of the generated text to the given context.
arXiv Detail & Related papers (2020-04-30T18:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.