Language Models are not Models of Language
- URL: http://arxiv.org/abs/2112.07055v1
- Date: Mon, 13 Dec 2021 22:39:46 GMT
- Title: Language Models are not Models of Language
- Authors: Csaba Veres
- Abstract summary: Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance.
We argue that the term language model is misleading because deep learning models are not theoretical models of language.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing (NLP) has become one of the leading application
areas in the current Artificial Intelligence boom. Transfer learning has
enabled large deep learning neural networks trained on the language modeling
task to vastly improve performance in almost all language tasks. Interestingly,
when the models are trained with data that includes software code, they
demonstrate remarkable abilities in generating functioning computer code from
natural language specifications. We argue that this creates a conundrum for
claims that neural models provide an alternative theory to generative phrase
structure grammars in explaining how language works. Since the syntax of
programming languages is determined by phrase structure grammars, successful
neural models are apparently uninformative about the theoretical foundations of
programming languages, and by extension, natural languages. We argue that the
term language model is misleading because deep learning models are not
theoretical models of language and propose the adoption of corpus model
instead, which better reflects the genesis and contents of the model.
Related papers
- Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework.
Our model reveals that most of the connections become deterministic after learning.
Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - Why can neural language models solve next-word prediction? A
mathematical perspective [53.807657273043446]
We study a class of formal languages that can be used to model real-world examples of English sentences.
Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.
arXiv Detail & Related papers (2023-06-20T10:41:23Z) - Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic [14.618731441943847]
We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess.
Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
arXiv Detail & Related papers (2022-11-03T18:53:30Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - A Precis of Language Models are not Models of Language [0.0]
We show that despite their many successes at performing linguistic tasks, Large Neural Language Models are ill-suited as comprehensive models of natural language.
In spite of the often overbearing optimism about AI, modern neural models do not represent a revolution in our understanding of cognition.
arXiv Detail & Related papers (2022-05-16T12:50:58Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.