Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition
and Constraints
- URL: http://arxiv.org/abs/2306.02671v1
- Date: Mon, 5 Jun 2023 08:05:05 GMT
- Title: Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition
and Constraints
- Authors: Chao Lou, Kewei Tu
- Abstract summary: We study two low-rank variants of Neural QCFG for faster inference.
We introduce two soft constraints over tree hierarchy and source coverage.
We find that our models outperform vanilla Neural QCFG in most settings.
- Score: 30.219318352970948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural QCFG is a grammar-based sequence-tosequence (seq2seq) model with
strong inductive biases on hierarchical structures. It excels in
interpretability and generalization but suffers from expensive inference. In
this paper, we study two low-rank variants of Neural QCFG for faster inference
with different trade-offs between efficiency and expressiveness. Furthermore,
utilizing the symbolic interface provided by the grammar, we introduce two soft
constraints over tree hierarchy and source coverage. We experiment with various
datasets and find that our models outperform vanilla Neural QCFG in most
settings.
Related papers
- Best of Both Worlds: Advantages of Hybrid Graph Sequence Models [20.564009321626198]
We present a unifying framework for adopting graph sequence models for learning on graphs.
We evaluate the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks.
We present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences.
arXiv Detail & Related papers (2024-11-23T23:24:42Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - ORCHARD: A Benchmark For Measuring Systematic Generalization of
Multi-Hierarchical Reasoning [8.004425059996963]
We show that Transformer and LSTM models surprisingly fail in systematic generalization.
We also show that with increased references between hierarchies, Transformer performs no better than random.
arXiv Detail & Related papers (2021-11-28T03:11:37Z) - Equivariant Subgraph Aggregation Networks [23.26140936226352]
This paper proposes a novel framework called Equivariant Subgraph Aggregation Networks (ESAN) to address this issue.
While two graphs may not be distinguishable by an MPNN, they often contain distinguishable subgraphs.
We develop novel variants of the 1-dimensional Weisfeiler-Leman (1-WL) test for graph isomorphism, and prove lower bounds on the expressiveness of ESAN.
We provide theoretical results that describe how design choices such as the subgraph selection policy and equivariant neural architecture affect our architecture's expressive power.
arXiv Detail & Related papers (2021-10-06T16:45:07Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - Building powerful and equivariant graph neural networks with structural
message-passing [74.93169425144755]
We propose a powerful and equivariant message-passing framework based on two ideas.
First, we propagate a one-hot encoding of the nodes, in addition to the features, in order to learn a local context matrix around each node.
Second, we propose methods for the parametrization of the message and update functions that ensure permutation equivariance.
arXiv Detail & Related papers (2020-06-26T17:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.