4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees
- URL: http://arxiv.org/abs/2310.14319v1
- Date: Sun, 22 Oct 2023 14:43:53 GMT
- Title: 4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees
- Authors: Carlos G\'omez-Rodr\'iguez, Diego Roca, David Vilares
- Abstract summary: We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word.
Results on a set of diverse treebanks show that our 7-bit encoding obtains substantial accuracy gains over the previously best-performing sequence labeling encodings.
- Score: 7.466159270333272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce an encoding for parsing as sequence labeling that can represent
any projective dependency tree as a sequence of 4-bit labels, one per word. The
bits in each word's label represent (1) whether it is a right or left
dependent, (2) whether it is the outermost (left/right) dependent of its
parent, (3) whether it has any left children and (4) whether it has any right
children. We show that this provides an injective mapping from trees to labels
that can be encoded and decoded in linear time. We then define a 7-bit
extension that represents an extra plane of arcs, extending the coverage to
almost full non-projectivity (over 99.9% empirical arc coverage). Results on a
set of diverse treebanks show that our 7-bit encoding obtains substantial
accuracy gains over the previously best-performing sequence labeling encodings.
Related papers
- HetTree: Heterogeneous Tree Graph Neural Network [12.403166161903378]
HetTree is a novel heterogeneous tree graph neural network that models both the graph structure and heterogeneous aspects.
HetTree builds a semantic tree data structure to capture the hierarchy among metapaths.
Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks.
arXiv Detail & Related papers (2024-02-21T03:14:45Z) - Label2Label: A Language Modeling Framework for Multi-Attribute Learning [93.68058298766739]
Label2Label is the first attempt for multi-attribute prediction from the perspective of language modeling.
Inspired by the success of pre-training language models in NLP, Label2Label introduces an image-conditioned masked language model.
Our intuition is that the instance-wise attribute relations are well grasped if the neural net can infer the missing attributes based on the context and the remaining attribute hints.
arXiv Detail & Related papers (2022-07-18T15:12:33Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - Integrating Dependency Tree Into Self-attention for Sentence
Representation [9.676884071651205]
We propose Dependency-Transformer, which applies a relation-attention mechanism that works in concert with the self-attention mechanism.
By a score-based method, we successfully inject the syntax information without affecting Transformer's parallelizability.
Our model outperforms or is comparable to the state-of-the-art methods on four tasks for sentence representation.
arXiv Detail & Related papers (2022-03-11T13:44:41Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Image-to-image Translation via Hierarchical Style Disentanglement [115.81148219591387]
We propose Hierarchical Style Disentanglement (HiSD) to address this issue.
Specifically, we organize the labels into a hierarchical tree structure, in which independent tags, exclusive attributes, and disentangled styles are allocated from top to bottom.
Both qualitative and quantitative results on the CelebA-HQ dataset verify the ability of the proposed HiSD.
arXiv Detail & Related papers (2021-03-02T03:43:18Z) - Bracketing Encodings for 2-Planar Dependency Parsing [14.653008985229617]
We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels.
We take into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks.
arXiv Detail & Related papers (2020-11-01T18:53:32Z) - A Unifying Theory of Transition-based and Sequence Labeling Parsing [14.653008985229617]
We map transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees.
This establishes a theoretical relation between transition-based parsing and sequence-labeling parsing.
We implement sequence labeling versions of four algorithms, showing that they are learnable and obtain comparable performance to existing encodings.
arXiv Detail & Related papers (2020-11-01T18:25:15Z) - Please Mind the Root: Decoding Arborescences for Dependency Parsing [67.71280539312536]
We analyze the output of state-of-the-arts on many languages from the Universal Dependency Treebank.
The worst constraint-violation rate we observe is 24%.
arXiv Detail & Related papers (2020-10-06T08:31:14Z) - More Embeddings, Better Sequence Labelers? [75.44925576268052]
Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings.
We conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to study the accuracy of sequence labeling with various embedding concatenations.
arXiv Detail & Related papers (2020-09-17T14:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.