Structural generalization in COGS: Supertagging is (almost) all you need
- URL: http://arxiv.org/abs/2310.14124v1
- Date: Sat, 21 Oct 2023 21:51:25 GMT
- Title: Structural generalization in COGS: Supertagging is (almost) all you need
- Authors: Alban Petit, Caio Corro, Fran\c{c}ois Yvon
- Abstract summary: Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required.
We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
- Score: 12.991247861348048
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In many Natural Language Processing applications, neural networks have been
found to fail to generalize on out-of-distribution examples. In particular,
several recent semantic parsing datasets have put forward important limitations
of neural networks in cases where compositional generalization is required. In
this work, we extend a neural graph-based semantic parsing framework in several
ways to alleviate this issue. Notably, we propose: (1) the introduction of a
supertagging step with valency constraints, expressed as an integer linear
program; (2) a reduction of the graph prediction problem to the maximum
matching problem; (3) the design of an incremental early-stopping training
strategy to prevent overfitting. Experimentally, our approach significantly
improves results on examples that require structural generalization in the COGS
dataset, a known challenging benchmark for compositional generalization.
Overall, our results confirm that structural constraints are important for
generalization in semantic parsing.
Related papers
- HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - Towards Bridging Generalization and Expressivity of Graph Neural Networks [11.560730203511111]
We study the relationship between expressivity and generalization in graph neural networks (GNNs)
We introduce a novel framework that connects GNN generalization to the variance in graph structures they can capture.
We uncover a trade-off between intra-class concentration and inter-class separation, both of which are crucial for effective generalization.
arXiv Detail & Related papers (2024-10-14T00:31:16Z) - On the Expressiveness and Generalization of Hypergraph Neural Networks [77.65788763444877]
This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs)
Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes.
arXiv Detail & Related papers (2023-03-09T18:42:18Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Neuro-Symbolic Constraint Programming for Structured Prediction [32.427665902031436]
We propose Nester, a method for injecting neural networks into constrained structured predictors.
Nester takes advantage of the features of its two components: the neural network learns complex representations from low-level data.
An empirical evaluation on handwritten equation recognition shows that Nester achieves better performance than both the neural network and the constrained structured predictor.
arXiv Detail & Related papers (2021-03-31T17:31:33Z) - Neuro-algorithmic Policies enable Fast Combinatorial Generalization [16.74322664734553]
Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data.
We show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures.
We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver.
arXiv Detail & Related papers (2021-02-15T11:07:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.