Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective
- URL: http://arxiv.org/abs/2305.15057v2
- Date: Tue, 12 Dec 2023 08:46:11 GMT
- Title: Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective
- Authors: Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell
- Abstract summary: Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
- Score: 97.57162770792182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tasks that model the relation between pairs of tokens in a string are a vital
part of understanding natural language. Such tasks, in general, require
exhaustive pair-wise comparisons of tokens, thus having a quadratic runtime
complexity in the length of the string. We show that these exhaustive
comparisons can be avoided, and, moreover, the complexity of such tasks can be
reduced to linear by casting the relation between tokens as a partial order
over the string. Our method predicts real numbers for each token in a string in
parallel and sorts the tokens accordingly, resulting in total orders of the
tokens in the string. Each total order implies a set of arcs oriented from
smaller to greater tokens, sorted by their predicted numbers. The intersection
of total orders results in a partial order over the set of tokens in the
string, which is then decoded into a directed graph representing the desired
linguistic structure. Our experiments on dependency parsing and coreference
resolution show that our method achieves state-of-the-art or comparable
performance. Moreover, the linear complexity and parallelism of our method
double the speed of graph-based coreference resolution models, and bring a
10-times speed-up over graph-based dependency parsers.
Related papers
- Tokenization as Finite-State Transduction [24.19959327497118]
We introduce a finite-state framework which can efficiently encode all possible tokenizations of a regular language.
We show that Byte-Pair.
Match (BPE) and MaxPiece (WordPiece) fit within this framework.
An application of this is to guided generation, where the outputs of a language model are constrained to match some pattern.
arXiv Detail & Related papers (2024-10-21T07:10:07Z) - An Expression Tree Decoding Strategy for Mathematical Equation
Generation [24.131972875875952]
Existing approaches can be broadly categorized into token-level and expression-level generation.
Expression-level methods generate each expression one by one.
Each expression represents a solving step, and there naturally exist parallel or dependent relations between these steps.
We integrate tree structure into the expression-level generation and advocate an expression tree decoding strategy.
arXiv Detail & Related papers (2023-10-14T17:00:28Z) - Tree Cross Attention [59.8891512435847]
Tree Cross Attention (TCA) is a module based on Cross Attention that only retrieves information from a logarithmic $mathcalO(log(N))$ number of tokens for performing inference.
We show that TCA performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient.
arXiv Detail & Related papers (2023-09-29T16:50:23Z) - Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.
Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other.
We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - A Conditional Splitting Framework for Efficient Constituency Parsing [14.548146390081778]
We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions.
Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding.
For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
arXiv Detail & Related papers (2021-06-30T00:36:34Z) - An Iterative Contextualization Algorithm with Second-Order Attention [0.40611352512781856]
We show how to combine the representations of words that make up a sentence into a cohesive whole.
Our algorithm starts with a presumably erroneous value of the context, and adjusts this value with respect to the tokens at hand.
Our models report strong results in several well-known text classification tasks.
arXiv Detail & Related papers (2021-03-03T05:34:50Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.