Probing for Labeled Dependency Trees
- URL: http://arxiv.org/abs/2203.12971v1
- Date: Thu, 24 Mar 2022 10:21:07 GMT
- Title: Probing for Labeled Dependency Trees
- Authors: Max M\"uller-Eberstein, Rob van der Goot and Barbara Plank
- Abstract summary: DepProbe is a linear probe which can extract labeled and directed dependency parse trees from embeddings.
Across 13 languages, our proposed method identifies the best source treebank of the time.
- Score: 25.723591566201343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probing has become an important tool for analyzing representations in Natural
Language Processing (NLP). For graphical NLP tasks such as dependency parsing,
linear probes are currently limited to extracting undirected or unlabeled parse
trees which do not capture the full task. This work introduces DepProbe, a
linear probe which can extract labeled and directed dependency parse trees from
embeddings while using fewer parameters and compute than prior methods.
Leveraging its full task coverage and lightweight parametrization, we
investigate its predictive power for selecting the best transfer language for
training a full biaffine attention parser. Across 13 languages, our proposed
method identifies the best source treebank 94% of the time, outperforming
competitive baselines and prior work. Finally, we analyze the informativeness
of task-specific subspaces in contextual embeddings as well as which benefits a
full parser's non-linear parametrization provides.
Related papers
- Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing [23.091613114955543]
We propose to build an ensemble of different runs of the existing discontinuous by averaging the predicted trees.
We then develop an efficient exact algorithm to tackle the task, which runs in a reasonable time for all samples.
Results on three datasets show our method outperforms all baselines in all metrics.
arXiv Detail & Related papers (2024-02-29T21:49:31Z) - Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.
Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other.
We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z) - On Parsing as Tagging [66.31276017088477]
We show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing.
We empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders.
arXiv Detail & Related papers (2022-11-14T13:37:07Z) - Compositional Task-Oriented Parsing as Abstractive Question Answering [25.682923914685063]
Task-oriented parsing aims to convert natural language into machine-readable representations of specific tasks, such as setting an alarm.
A popular approach to TOP is to apply seq2seq models to generate linearized parse trees.
A more recent line of work argues that pretrained seq2seq models are better at generating outputs that are themselves natural language, so they replace linearized parse trees with canonical natural-language paraphrases.
arXiv Detail & Related papers (2022-05-04T14:01:08Z) - Rissanen Data Analysis: Examining Dataset Characteristics via
Description Length [78.42578316883271]
We introduce a method to determine if a certain capability helps to achieve an accurate model of given data.
Since minimum program length is uncomputable, we estimate the labels' minimum description length (MDL) as a proxy.
We call the method Rissanen Data Analysis (RDA) after the father of MDL.
arXiv Detail & Related papers (2021-03-05T18:58:32Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Reducing Confusion in Active Learning for Part-Of-Speech Tagging [100.08742107682264]
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost.
We study the problem of selecting instances which maximally reduce the confusion between particular pairs of output tags.
Our proposed AL strategy outperforms other AL strategies by a significant margin.
arXiv Detail & Related papers (2020-11-02T06:24:58Z) - Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads [27.578115452635625]
We propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads.
We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree.
Our experiments can also be used as a tool to analyze the grammars PLMs learn implicitly.
arXiv Detail & Related papers (2020-10-19T13:51:40Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting
BERT [29.04485839262945]
We propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT)
Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process.
Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines.
arXiv Detail & Related papers (2020-04-30T14:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.