Related papers: Propositional Logic for Probing Generalization in Neural Networks

Propositional Logic for Probing Generalization in Neural Networks

URL: http://arxiv.org/abs/2506.08978v1
Date: Tue, 10 Jun 2025 16:46:05 GMT
Title: Propositional Logic for Probing Generalization in Neural Networks
Authors: Anna Langedijk, Jaap Jumelet, Willem Zuidema,
Abstract summary: We investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic.<n>We find thatTransformers fail to apply negation compositionally, unless structural biases are introduced.<n>Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators.
Score: 3.6037930269014633
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The extent to which neural networks are able to acquire and represent symbolic rules remains a key topic of research and debate. Much current work focuses on the impressive capabilities of large language models, as well as their often ill-understood failures on a wide range of reasoning tasks. In this paper, in contrast, we investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic. The task requires models to generate satisfying assignments for logical formulas, making it a structured and interpretable setting for studying compositionality. We introduce a balanced extension of an existing dataset to eliminate superficial patterns and enable testing on unseen operator combinations. Using this dataset, we evaluate the ability of the three architectures to generalize beyond the training distribution. While all models perform well in-distribution, we find that generalization to unseen patterns, particularly those involving negation, remains a significant challenge. Transformers fail to apply negation compositionally, unless structural biases are introduced. Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators, suggesting the need for stronger inductive biases to support robust rule-based reasoning.

Related papers

Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks [0.0]
We introduce a framework to investigate capacity, sparsity, and robustness in low-capacity networks.<n>We show that trained networks are robust to extreme magnitude pruning (up to 95% sparsity)<n>This work provides a clear, empirical demonstration of the trade-offs governing simple neural networks.
arXiv Detail & Related papers (2025-07-22T06:43:03Z)
The Coverage Principle: A Framework for Understanding Compositional Generalization [31.762330857169914]
We show that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts.<n>We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers.
arXiv Detail & Related papers (2025-05-26T17:55:15Z)
When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed, compositionally structured representations.<n>We identify novel failure modes in compositional generalization that arise from biases in the training data.<n>This work examines how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning [73.98142349171552]
LOGICSEG is a holistic visual semantic that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge. During fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training. These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models.
arXiv Detail & Related papers (2023-09-24T05:43:19Z)
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain. In this work, we build models that rely on the message-passing rule of predictive coding. We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z)
Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice. We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z)
Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference. We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images. We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.