Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
- URL: http://arxiv.org/abs/2010.09517v1
- Date: Mon, 19 Oct 2020 13:51:40 GMT
- Title: Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
- Authors: Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
- Abstract summary: We propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.
We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree.
Our experiments can also be used as a tool to analyze the grammars PLMs learn implicitly.
- Score: 27.578115452635625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pre-trained language models (PLMs) have dramatically
improved the state of the art in NLP across many tasks. This has led to
substantial interest in analyzing the syntactic knowledge PLMs learn. Previous
approaches to this question have been limited, mostly using test suites or
probes. Here, we propose a novel fully unsupervised parsing approach that
extracts constituency trees from PLM attention heads. We rank transformer
attention heads based on their inherent properties, and create an ensemble of
high-ranking heads to produce the final tree. Our method is adaptable to
low-resource languages, as it does not rely on development sets, which can be
expensive to annotate. Our experiments show that the proposed method often
outperform existing approaches if there is no development set present. Our
unsupervised parser can also be used as a tool to analyze the grammars PLMs
learn implicitly. For this, we use the parse trees induced by our method to
train a neural PCFG and compare it to a grammar derived from a human-annotated
treebank.
Related papers
- Deep Natural Language Feature Learning for Interpretable Prediction [1.6114012813668932]
We propose a method to break down a main complex task into a set of intermediary easier sub-tasks.
Our method allows for representing each example by a vector consisting of the answers to these questions.
We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.
arXiv Detail & Related papers (2023-11-09T21:43:27Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Revisiting the Practical Effectiveness of Constituency Parse Extraction
from Pre-trained Language Models [6.850627526999892]
Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent paradigm that attempts to induce constituency parse trees relying only on the internal knowledge of pre-trained language models.
We show that CPE-PLM is more effective than typical supervised parsing in few-shot settings.
arXiv Detail & Related papers (2022-09-15T09:41:19Z) - Unsupervised and Few-shot Parsing from Pretrained Language Models [56.33247845224995]
We propose an Unsupervised constituent Parsing model that calculates an Out Association score solely based on the self-attention weight matrix learned in a pretrained language model.
We extend the unsupervised models to few-shot parsing models that use a few annotated trees to learn better linear projection matrices for parsing.
Our few-shot parsing model FPIO trained with only 20 annotated trees outperforms a previous few-shot parsing method trained with 50 annotated trees.
arXiv Detail & Related papers (2022-06-10T10:29:15Z) - Sort by Structure: Language Model Ranking as Dependency Probing [25.723591566201343]
Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored.
We propose probing to rank LMs, specifically for parsing dependencies in a given language, by measuring the degree to which labeled trees are recoverable from an LM's contextualized embeddings.
Across 46 typologically and architecturally diverse LM-language pairs, our approach predicts the best LM choice of 79% of orders of less compute than training a full magnitude of orders of less compute.
arXiv Detail & Related papers (2022-06-10T08:10:29Z) - Compositional Task-Oriented Parsing as Abstractive Question Answering [25.682923914685063]
Task-oriented parsing aims to convert natural language into machine-readable representations of specific tasks, such as setting an alarm.
A popular approach to TOP is to apply seq2seq models to generate linearized parse trees.
A more recent line of work argues that pretrained seq2seq models are better at generating outputs that are themselves natural language, so they replace linearized parse trees with canonical natural-language paraphrases.
arXiv Detail & Related papers (2022-05-04T14:01:08Z) - Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads [114.77890059625162]
We propose a method, called Single-Shot Meta-Pruning, to compress deep pre-trained Transformers before fine-tuning.
We focus on pruning unnecessary attention heads adaptively for different downstream tasks.
Compared with existing compression methods for pre-trained models, our method can reduce the overhead of both fine-tuning and inference.
arXiv Detail & Related papers (2020-11-07T12:58:37Z) - Strongly Incremental Constituency Parsing with Graph Neural Networks [70.16880251349093]
Parsing sentences into syntax trees can benefit downstream applications in NLP.
Transition-baseds build trees by executing actions in a state transition system.
Existing transition-baseds are predominantly based on the shift-reduce transition system.
arXiv Detail & Related papers (2020-10-27T19:19:38Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z) - Towards an evolutionary-based approach for natural language processing [14.760703384346984]
We propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task.
The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output.
arXiv Detail & Related papers (2020-04-23T18:44:12Z) - Byte Pair Encoding is Suboptimal for Language Model Pretraining [49.30780227162387]
We analyze differences between unigram LM tokenization and byte-pair encoding (BPE)
We find that the unigram LM tokenization method matches or outperforms BPE across downstream tasks and two languages.
We hope that developers of future pretrained LMs will consider adopting the unigram LM method over the more prevalent BPE.
arXiv Detail & Related papers (2020-04-07T21:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.