Structure-aware Fine-tuning of Sequence-to-sequence Transformers for
Transition-based AMR Parsing
- URL: http://arxiv.org/abs/2110.15534v1
- Date: Fri, 29 Oct 2021 04:36:31 GMT
- Title: Structure-aware Fine-tuning of Sequence-to-sequence Transformers for
Transition-based AMR Parsing
- Authors: Jiawei Zhou, Tahira Naseem, Ram\'on Fernandez Astudillo, Young-Suk
Lee, Radu Florian, Salim Roukos
- Abstract summary: We explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach.
We propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning.
We show that the proposed parsing architecture retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new state of the art for AMR 2.0, without the need for graph re-categorization.
- Score: 20.67024416678313
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Predicting linearized Abstract Meaning Representation (AMR) graphs using
pre-trained sequence-to-sequence Transformer models has recently led to large
improvements on AMR parsing benchmarks. These parsers are simple and avoid
explicit modeling of structure but lack desirable properties such as graph
well-formedness guarantees or built-in graph-sentence alignments. In this work
we explore the integration of general pre-trained sequence-to-sequence language
models and a structure-aware transition-based approach. We depart from a
pointer-based transition system and propose a simplified transition set,
designed to better exploit pre-trained language models for structured
fine-tuning. We also explore modeling the parser state within the pre-trained
encoder-decoder architecture and different vocabulary strategies for the same
purpose. We provide a detailed comparison with recent progress in AMR parsing
and show that the proposed parser retains the desirable properties of previous
transition-based approaches, while being simpler and reaching the new parsing
state of the art for AMR 2.0, without the need for graph re-categorization.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational
AutoEncoders [5.037881619912574]
In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAEs.
Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models.
Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space.
arXiv Detail & Related papers (2023-11-14T22:47:23Z) - AMR Parsing with Causal Hierarchical Attention and Pointers [54.382865897298046]
We introduce new target forms of AMR parsing and a novel model, CHAP, which is equipped with causal hierarchical attention and the pointer mechanism.
Experiments show that our model outperforms baseline models on four out of five benchmarks in the setting of no additional data.
arXiv Detail & Related papers (2023-10-18T13:44:26Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Structure-Aware Transformer for Graph Representation Learning [7.4124458942877105]
We show that node representations generated by the Transformer with positional encoding do not necessarily capture structural similarity between them.
We propose the Structure-Aware Transformer, a class of simple and flexible graph transformers built upon a new self-attention mechanism.
Our framework can leverage any existing GNN to extract the subgraph representation, and we show that it systematically improves performance relative to the base GNN model.
arXiv Detail & Related papers (2022-02-07T09:53:39Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling [18.028902306143102]
We propose a Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) model, which encodes the syntactic structure using a novel way to input graph relations as embeddings.
This approach adds a soft bias towards attention patterns that follow the syntactic structure but also allows the model to use this information to learn alternative patterns.
We evaluate our model on both span-based and dependency-based SRL datasets, and outperform previous alternative methods in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2021-04-15T18:14:18Z) - Transition-based Parsing with Stack-Transformers [32.029528327212795]
Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state.
We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Meaning.
arXiv Detail & Related papers (2020-10-20T23:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.