Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
Languages
- URL: http://arxiv.org/abs/2309.00857v2
- Date: Thu, 19 Oct 2023 11:41:58 GMT
- Title: Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
Languages
- Authors: Shunjie Wang, Shane Steinert-Threlkeld
- Abstract summary: Recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages.
We test the Transformer's ability to learn mildly context-sensitive languages of varying complexities.
Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior.
- Score: 6.227678387562755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the fact that Transformers perform well in NLP tasks, recent studies
suggest that self-attention is theoretically limited in learning even some
regular and context-free languages. These findings motivated us to think about
their implications in modeling natural language, which is hypothesized to be
mildly context-sensitive. We test the Transformer's ability to learn mildly
context-sensitive languages of varying complexities, and find that they
generalize well to unseen in-distribution data, but their ability to
extrapolate to longer strings is worse than that of LSTMs. Our analyses show
that the learned self-attention patterns and representations modeled dependency
relations and demonstrated counting behavior, which may have helped the models
solve the languages.
Related papers
- From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Evaluating Neural Language Models as Cognitive Models of Language
Acquisition [4.779196219827507]
We argue that some of the most prominent benchmarks for evaluating the syntactic capacities of neural language models may not be sufficiently rigorous.
When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models.
We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
arXiv Detail & Related papers (2023-10-31T00:16:17Z) - Linearity of Relation Decoding in Transformer Language Models [82.47019600662874]
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations.
We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation.
arXiv Detail & Related papers (2023-08-17T17:59:19Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.