CPTAM: Constituency Parse Tree Aggregation Method
- URL: http://arxiv.org/abs/2201.07905v2
- Date: Sat, 1 Jul 2023 23:18:06 GMT
- Title: CPTAM: Constituency Parse Tree Aggregation Method
- Authors: Adithya Kulkarni, Nasim Sabetpour, Alexey Markin, Oliver Eulenstein,
Qi Li
- Abstract summary: This paper adopts the truth discovery idea to aggregate constituency parse trees from different distances.
We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation.
Experiments are conducted on benchmark datasets in different languages and domains.
- Score: 6.011216641982612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diverse Natural Language Processing tasks employ constituency parsing to
understand the syntactic structure of a sentence according to a phrase
structure grammar. Many state-of-the-art constituency parsers are proposed, but
they may provide different results for the same sentences, especially for
corpora outside their training domains. This paper adopts the truth discovery
idea to aggregate constituency parse trees from different parsers by estimating
their reliability in the absence of ground truth. Our goal is to consistently
obtain high-quality aggregated constituency parse trees. We formulate the
constituency parse tree aggregation problem in two steps, structure aggregation
and constituent label aggregation. Specifically, we propose the first truth
discovery solution for tree structures by minimizing the weighted sum of
Robinson-Foulds (RF) distances, a classic symmetric distance metric between two
trees. Extensive experiments are conducted on benchmark datasets in different
languages and domains. The experimental results show that our method, CPTAM,
outperforms the state-of-the-art aggregation baselines. We also demonstrate
that the weights estimated by CPTAM can adequately evaluate constituency
parsers in the absence of ground truth.
Related papers
- Improving Unsupervised Constituency Parsing via Maximizing Semantic Information [35.63321102040579]
Unsupervised constituencys organize phrases within a sentence into a tree-shaped syntactic constituent structure.
Traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics.
We introduce a novel objective for training unsupervised metrics: maximizing the information between constituent structures and sentence semantics (SemInfo)
arXiv Detail & Related papers (2024-10-03T15:04:00Z) - Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing [43.758912958903494]
We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speechs.
To compute the metric, we project the ground-truth parse tree to the speech domain by forced alignment, align the projected ground-truth constituents with the predicted ones under certain structured constraints, and calculate the average IOU score across all aligned constituent pairs.
arXiv Detail & Related papers (2024-02-21T00:01:17Z) - Cascading and Direct Approaches to Unsupervised Constituency Parsing on
Spoken Sentences [67.37544997614646]
We present the first study on unsupervised spoken constituency parsing.
The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees.
We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
arXiv Detail & Related papers (2023-03-15T17:57:22Z) - Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision
Tree: the Case of Postnominal "that" in Noun Complement Clauses vs. Relative
Clauses [0.0]
We investigated two methods to parse relative and noun complement clauses in English.
We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency.
Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that"
arXiv Detail & Related papers (2022-12-05T20:52:41Z) - Biaffine Discourse Dependency Parsing [0.0]
We use the biaffine model for neural discourse dependency parsing and achieve significant performance improvement compared with the baselines.
We compare the Eisner algorithm and the Chu-Liu-Edmonds algorithm in the task and find that using the Chu-Liu-Edmonds generates deeper trees.
arXiv Detail & Related papers (2022-01-12T12:56:13Z) - A Conditional Splitting Framework for Efficient Constituency Parsing [14.548146390081778]
We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions.
Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding.
For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
arXiv Detail & Related papers (2021-06-30T00:36:34Z) - Unsupervised Parsing via Constituency Tests [49.42244463346612]
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test.
To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score.
The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.
arXiv Detail & Related papers (2020-10-07T04:05:01Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.