Transition-based Parsing with Stack-Transformers
- URL: http://arxiv.org/abs/2010.10669v1
- Date: Tue, 20 Oct 2020 23:20:31 GMT
- Title: Transition-based Parsing with Stack-Transformers
- Authors: Ramon Fernandez Astudillo, Miguel Ballesteros, Tahira Naseem, Austin
Blodgett, Radu Florian
- Abstract summary: Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state.
We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Meaning.
- Score: 32.029528327212795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling the parser state is key to good performance in transition-based
parsing. Recurrent Neural Networks considerably improved the performance of
transition-based systems by modelling the global state, e.g. stack-LSTM
parsers, or local state modeling of contextualized features, e.g. Bi-LSTM
parsers. Given the success of Transformer architectures in recent parsing
systems, this work explores modifications of the sequence-to-sequence
Transformer architecture to model either global or local parser states in
transition-based parsing. We show that modifications of the cross attention
mechanism of the Transformer considerably strengthen performance both on
dependency and Abstract Meaning Representation (AMR) parsing tasks,
particularly for smaller models or limited training data.
Related papers
- Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models [42.46104516313823]
Dependency Transformer Grammars (DTGs) are a new class of Transformer language model with explicit dependency-based inductive bias.
DTGs simulate dependency transition systems with constrained attention patterns.
They achieve better generalization while maintaining comparable perplexity with Transformer language model baselines.
arXiv Detail & Related papers (2024-07-24T16:38:38Z) - Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers [56.264673865476986]
This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models.
SLA improves the model's ability to capture dependencies between high-level abstract features and low-level details.
Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer.
arXiv Detail & Related papers (2024-06-17T07:24:38Z) - Repeat After Me: Transformers are Better than State Space Models at Copying [53.47717661441142]
We show that while generalized state space models are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.
arXiv Detail & Related papers (2024-02-01T21:44:11Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Shift-Reduce Task-Oriented Semantic Parsing with Stack-Transformers [6.744385328015561]
Task-oriented dialogue systems, such as Apple Siri and Amazon Alexa, require a semantic parsing module in order to process user utterances and understand the action to be performed.
This semantic parsing component was initially implemented by rule-based or statistical slot-filling approaches for processing simple queries.
In this article, we advance the research on neural-reduce semantic parsing for task-oriented dialogue.
arXiv Detail & Related papers (2022-10-21T14:19:47Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - Structure-aware Fine-tuning of Sequence-to-sequence Transformers for
Transition-based AMR Parsing [20.67024416678313]
We explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach.
We propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning.
We show that the proposed parsing architecture retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new state of the art for AMR 2.0, without the need for graph re-categorization.
arXiv Detail & Related papers (2021-10-29T04:36:31Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Optimizing Inference Performance of Transformers on CPUs [0.0]
Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc.
This paper presents an empirical analysis of scalability and performance of inferencing a Transformer-based model on CPUs.
arXiv Detail & Related papers (2021-02-12T17:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.