Related papers: LLMs as a synthesis between symbolic and continuous approaches to language

LLMs as a synthesis between symbolic and continuous approaches to language

URL: http://arxiv.org/abs/2502.11856v1
Date: Mon, 17 Feb 2025 14:48:18 GMT
Title: LLMs as a synthesis between symbolic and continuous approaches to language
Authors: Gemma Boleda,
Abstract summary: I argue that deep learning models for language represent a synthesis between the two traditions.<n>I review recent research in mechanistic interpretability that showcases how a substantial part of morphosyntactic knowledge is encoded in a near-discrete fashion in LLMs.
Score: 5.333866030919832
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Since the middle of the 20th century, a fierce battle is being fought between symbolic and continuous approaches to language and cognition. The success of deep learning models, and LLMs in particular, has been alternatively taken as showing that the continuous camp has won, or dismissed as an irrelevant engineering development. However, in this position paper I argue that deep learning models for language actually represent a synthesis between the two traditions. This is because 1) deep learning architectures allow for both continuous/distributed and symbolic/discrete-like representations and computations; 2) models trained on language make use this flexibility. In particular, I review recent research in mechanistic interpretability that showcases how a substantial part of morphosyntactic knowledge is encoded in a near-discrete fashion in LLMs. This line of research suggests that different behaviors arise in an emergent fashion, and models flexibly alternate between the two modes (and everything in between) as needed. This is possibly one of the main reasons for their wild success; and it is also what makes them particularly interesting for the study of language and cognition. Is it time for peace?

Related papers

Language Models Are Implicitly Continuous [5.445513969959226]
We show that Transformer-based language models implicitly learn to represent sentences as continuous-time functions. This phenomenon occurs in most state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral.
arXiv Detail & Related papers (2025-04-04T21:01:20Z)
Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning [50.99811144731619]
We show that the choice of formal language affects both the syntactic and the semantic reasoning capability.<n>We conclude that on average, context-aware encodings help LLMs to reason, while there is no apparent effect of using comments or markdown syntax.
arXiv Detail & Related papers (2025-02-24T14:49:52Z)
Large Concept Models: Language Modeling in a Sentence Representation Space [62.73366944266477]
We present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. We show that our model exhibits impressive zero-shot generalization performance to many languages.
arXiv Detail & Related papers (2024-12-11T23:36:20Z)
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z)
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models [13.300199242824934]
We show that sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools.
arXiv Detail & Related papers (2023-10-25T17:15:55Z)
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear. In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z)
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random. These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z)
Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions. Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z)
Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models [29.993190226231793]
We use chain-of-thought prompts to introduce structures from probabilistic models into large language models. Our prompts lead language models to infer latent variables and reason about their relationships in order to choose appropriate paraphrases for metaphors.
arXiv Detail & Related papers (2022-09-16T19:23:13Z)
Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging? [94.68962249604749]
We propose a Masked Part-of-Speech Model (MPoSM) to facilitate flexible dependency modeling. MPoSM can model arbitrary tag dependency and perform POS induction through the objective of masked POS reconstruction. We achieve competitive results on both the English Penn WSJ dataset and the universal treebank containing 10 diverse languages.
arXiv Detail & Related papers (2022-06-30T01:43:05Z)
Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks. Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.