Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
- URL: http://arxiv.org/abs/2505.11693v2
- Date: Thu, 10 Jul 2025 12:11:41 GMT
- Title: Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
- Authors: Ana Ezquerro, David Vilares, Anssi Yli-Jyrä, Carlos Gómez-Rodríguez,
- Abstract summary: We present a family of encodings for sequence labeling dependency parsing.<n>We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels.
- Score: 16.54938714613888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We prove that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels (vs. 16 for the 4-bit encoding). We also extend optimal hierarchical bracketing to support arbitrary non-projectivity in a more compact way than previous encodings. Our new encodings yield competitive accuracy on a diverse set of treebanks.
Related papers
- High-Rate Extended Binomial Codes for Multi-Qubit Encoding [0.5439020425819]
We propose a mapping from qubit quantum error correction codes (QECCs) to bosonic QECCs.<n>Our work can be seen as the bosonic analogue of converting (K) uses of ([N_K,K,D]]) qubit codes into a single use of ([N_K,K,D]]) qubit codes.
arXiv Detail & Related papers (2025-01-13T07:04:05Z) - 4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees [7.466159270333272]
We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word.
Results on a set of diverse treebanks show that our 7-bit encoding obtains substantial accuracy gains over the previously best-performing sequence labeling encodings.
arXiv Detail & Related papers (2023-10-22T14:43:53Z) - Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation.
We propose a principled method that improves upon previous work from two perspectives: encoding and decoding.
Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Bracketing Encodings for 2-Planar Dependency Parsing [14.653008985229617]
We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels.
We take into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks.
arXiv Detail & Related papers (2020-11-01T18:53:32Z) - A Unifying Theory of Transition-based and Sequence Labeling Parsing [14.653008985229617]
We map transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees.
This establishes a theoretical relation between transition-based parsing and sequence-labeling parsing.
We implement sequence labeling versions of four algorithms, showing that they are learnable and obtain comparable performance to existing encodings.
arXiv Detail & Related papers (2020-11-01T18:25:15Z) - Please Mind the Root: Decoding Arborescences for Dependency Parsing [67.71280539312536]
We analyze the output of state-of-the-arts on many languages from the Universal Dependency Treebank.
The worst constraint-violation rate we observe is 24%.
arXiv Detail & Related papers (2020-10-06T08:31:14Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.