On the Linguistic Capacity of Real-Time Counter Automata
- URL: http://arxiv.org/abs/2004.06866v2
- Date: Thu, 9 Sep 2021 05:24:50 GMT
- Title: On the Linguistic Capacity of Real-Time Counter Automata
- Authors: William Merrill
- Abstract summary: We study the abilities of real-time counter machines as formal grammars.
We show that counter languages are closed under complement, union, intersection, and many other common set operations.
This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.
- Score: 1.8072051868187933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counter machines have achieved a newfound relevance to the field of natural
language processing (NLP): recent work suggests some strong-performing
recurrent neural networks utilize their memory as counters. Thus, one potential
way to understand the success of these networks is to revisit the theory of
counter computation. Therefore, we study the abilities of real-time counter
machines as formal grammars, focusing on formal properties that are relevant
for NLP models. We first show that several variants of the counter machine
converge to express the same class of formal languages. We also prove that
counter languages are closed under complement, union, intersection, and many
other common set operations. Next, we show that counter machines cannot
evaluate boolean expressions, even though they can weakly validate their
syntax. This has implications for the interpretability and evaluation of neural
network systems: successfully matching syntactic patterns does not guarantee
that counter memory accurately encodes compositional semantics. Finally, we
consider whether counter languages are semilinear. This work makes general
contributions to the theory of formal languages that are of potential interest
for understanding recurrent neural networks.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs)
We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z) - Lower Bounds on the Expressivity of Recurrent Neural Language Models [47.537525313391164]
Investigation into the representational capacity of neural LMs has predominantly focused on their ability to emphrecognize formal languages.
We take a fresh look at the representational capacity of RNN LMs by connecting them to emphprobabilistic FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.
arXiv Detail & Related papers (2024-05-29T16:02:09Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Nondeterministic Stacks in Neural Networks [0.456877715768796]
We develop a differentiable data structure that efficiently simulates a nondeterministic pushdown automaton.
We show that this raises their formal recognition power to arbitrary context-free languages.
We also show that an RNN augmented with a nondeterministic stack is capable of surprisingly powerful behavior.
arXiv Detail & Related papers (2023-04-25T16:00:40Z) - Join-Chain Network: A Logical Reasoning View of the Multi-head Attention
in Transformer [59.73454783958702]
We propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions.
In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET.
We find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space.
arXiv Detail & Related papers (2022-10-06T07:39:58Z) - Emergence of Machine Language: Towards Symbolic Intelligence with Neural
Networks [73.94290462239061]
We propose to combine symbolism and connectionism principles by using neural networks to derive a discrete representation.
By designing an interactive environment and task, we demonstrated that machines could generate a spontaneous, flexible, and semantic language.
arXiv Detail & Related papers (2022-01-14T14:54:58Z) - SyGNS: A Systematic Generalization Testbed Based on Natural Language
Semantics [39.845425535943534]
We propose a Systematic Generalization testbed based on Natural language Semantics (SyGNS)
We test whether neural networks can systematically parse sentences involving novel combinations of logical expressions such as quantifiers and negation.
Experiments show that Transformer and GRU models can generalize to unseen combinations of quantifiers, negations, and modifier that are similar to given training instances in form, but not to the others.
arXiv Detail & Related papers (2021-06-02T11:24:41Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.