Related papers: Structural generalization in COGS: Supertagging is (almost) all you need

Structural generalization in COGS: Supertagging is (almost) all you need

URL: http://arxiv.org/abs/2310.14124v1
Date: Sat, 21 Oct 2023 21:51:25 GMT
Title: Structural generalization in COGS: Supertagging is (almost) all you need
Authors: Alban Petit, Caio Corro, Fran\c{c}ois Yvon
Abstract summary: Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required. We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
Score: 12.991247861348048
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In many Natural Language Processing applications, neural networks have been found to fail to generalize on out-of-distribution examples. In particular, several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required. In this work, we extend a neural graph-based semantic parsing framework in several ways to alleviate this issue. Notably, we propose: (1) the introduction of a supertagging step with valency constraints, expressed as an integer linear program; (2) a reduction of the graph prediction problem to the maximum matching problem; (3) the design of an incremental early-stopping training strategy to prevent overfitting. Experimentally, our approach significantly improves results on examples that require structural generalization in the COGS dataset, a known challenging benchmark for compositional generalization. Overall, our results confirm that structural constraints are important for generalization in semantic parsing.

Related papers

Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Ability [20.371836553400232]
This paper investigates the generalizability of neural networks that minimize or approximately minimize empirical risk. We provide theoretical insights into several phenomena in deep learning, including robust generalization.
arXiv Detail & Related papers (2025-03-06T05:36:35Z)
Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation [22.04352144324223]
We introduce a theoretical framework using Rademacher complexity to compute a regret bound on the generalization error.<n>This framework informs the design of GRATIN, an efficient graph data augmentation algorithm.
arXiv Detail & Related papers (2024-11-13T14:26:04Z)
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z)
Towards Bridging Generalization and Expressivity of Graph Neural Networks [11.560730203511111]
We study the relationship between expressivity and generalization in graph neural networks (GNNs) We introduce a novel framework that connects GNN generalization to the variance in graph structures they can capture. We uncover a trade-off between intra-class concentration and inter-class separation, both of which are crucial for effective generalization.
arXiv Detail & Related papers (2024-10-14T00:31:16Z)
On the Expressiveness and Generalization of Hypergraph Neural Networks [77.65788763444877]
This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs) Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes.
arXiv Detail & Related papers (2023-03-09T18:42:18Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages. We propose a compositional strategy to address this. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z)
Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference. We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z)
Grounded Graph Decoding Improves Compositional Generalization in Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures. We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism. Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
Neuro-Symbolic Constraint Programming for Structured Prediction [32.427665902031436]
We propose Nester, a method for injecting neural networks into constrained structured predictors. Nester takes advantage of the features of its two components: the neural network learns complex representations from low-level data. An empirical evaluation on handwritten equation recognition shows that Nester achieves better performance than both the neural network and the constrained structured predictor.
arXiv Detail & Related papers (2021-03-31T17:31:33Z)
Neuro-algorithmic Policies enable Fast Combinatorial Generalization [16.74322664734553]
Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures. We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver.
arXiv Detail & Related papers (2021-02-15T11:07:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.