A Conditional Splitting Framework for Efficient Constituency Parsing
- URL: http://arxiv.org/abs/2106.15760v1
- Date: Wed, 30 Jun 2021 00:36:34 GMT
- Title: A Conditional Splitting Framework for Efficient Constituency Parsing
- Authors: Thanh-Tung Nguyen, Xuan-Phi Nguyen, Shafiq Joty, Xiaoli Li
- Abstract summary: We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions.
Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding.
For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
- Score: 14.548146390081778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a generic seq2seq parsing framework that casts constituency
parsing problems (syntactic and discourse parsing) into a series of conditional
splitting decisions. Our parsing model estimates the conditional probability
distribution of possible splitting points in a given text span and supports
efficient top-down decoding, which is linear in number of nodes. The
conditional splitting formulation together with efficient beam search inference
facilitate structural consistency without relying on expensive structured
inference. Crucially, for discourse analysis we show that in our formulation,
discourse segmentation can be framed as a special case of parsing which allows
us to perform discourse parsing without requiring segmentation as a
pre-requisite. Experiments show that our model achieves good results on the
standard syntactic parsing tasks under settings with/without pre-trained
representations and rivals state-of-the-art (SoTA) methods that are more
computationally expensive than ours. In discourse parsing, our method
outperforms SoTA by a good margin.
Related papers
- Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing [43.758912958903494]
We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speechs.
To compute the metric, we project the ground-truth parse tree to the speech domain by forced alignment, align the projected ground-truth constituents with the predicted ones under certain structured constraints, and calculate the average IOU score across all aligned constituent pairs.
arXiv Detail & Related papers (2024-02-21T00:01:17Z) - Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z) - Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing [110.4684789199555]
We introduce scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario"
This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules.
Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios.
arXiv Detail & Related papers (2022-02-02T08:00:21Z) - CPTAM: Constituency Parse Tree Aggregation Method [6.011216641982612]
This paper adopts the truth discovery idea to aggregate constituency parse trees from different distances.
We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation.
Experiments are conducted on benchmark datasets in different languages and domains.
arXiv Detail & Related papers (2022-01-19T23:05:37Z) - Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on
Spoken Language Understanding [101.24748444126982]
Decomposable tasks are complex and comprise of a hierarchy of sub-tasks.
Existing benchmarks, however, typically hold out examples for only the surface-level sub-task.
We propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions.
arXiv Detail & Related papers (2021-06-29T02:53:59Z) - Context-Preserving Text Simplification [11.830061911323025]
We present a context-preserving text simplification (TS) approach that splits and rephrases complex English sentences into a semantic hierarchy of simplified sentences.
Using a set of linguistically principled transformation patterns, input sentences are converted into a hierarchical representation in the form of core sentences and accompanying contexts that are linked via rhetorical relations.
A comparative analysis with the annotations contained in the RST-DT shows that we are able to capture the contextual hierarchy between the split sentences with a precision of 89% and reach an average precision of 69% for the classification of the rhetorical relations that hold between them.
arXiv Detail & Related papers (2021-05-24T09:54:56Z) - RST Parsing from Scratch [14.548146390081778]
We introduce a novel end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework.
Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite.
Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees.
arXiv Detail & Related papers (2021-05-23T06:19:38Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.