Related papers: CPTAM: Constituency Parse Tree Aggregation Method

CPTAM: Constituency Parse Tree Aggregation Method

URL: http://arxiv.org/abs/2201.07905v2
Date: Sat, 1 Jul 2023 23:18:06 GMT
Title: CPTAM: Constituency Parse Tree Aggregation Method
Authors: Adithya Kulkarni, Nasim Sabetpour, Alexey Markin, Oliver Eulenstein, Qi Li
Abstract summary: This paper adopts the truth discovery idea to aggregate constituency parse trees from different distances. We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation. Experiments are conducted on benchmark datasets in different languages and domains.
Score: 6.011216641982612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diverse Natural Language Processing tasks employ constituency parsing to understand the syntactic structure of a sentence according to a phrase structure grammar. Many state-of-the-art constituency parsers are proposed, but they may provide different results for the same sentences, especially for corpora outside their training domains. This paper adopts the truth discovery idea to aggregate constituency parse trees from different parsers by estimating their reliability in the absence of ground truth. Our goal is to consistently obtain high-quality aggregated constituency parse trees. We formulate the constituency parse tree aggregation problem in two steps, structure aggregation and constituent label aggregation. Specifically, we propose the first truth discovery solution for tree structures by minimizing the weighted sum of Robinson-Foulds (RF) distances, a classic symmetric distance metric between two trees. Extensive experiments are conducted on benchmark datasets in different languages and domains. The experimental results show that our method, CPTAM, outperforms the state-of-the-art aggregation baselines. We also demonstrate that the weights estimated by CPTAM can adequately evaluate constituency parsers in the absence of ground truth.

Related papers

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information [35.63321102040579]
Unsupervised constituencys organize phrases within a sentence into a tree-shaped syntactic constituent structure. Traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics. We introduce a novel objective for training unsupervised metrics: maximizing the information between constituent structures and sentence semantics (SemInfo)
arXiv Detail & Related papers (2024-10-03T15:04:00Z)
Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing [43.758912958903494]
We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speechs. To compute the metric, we project the ground-truth parse tree to the speech domain by forced alignment, align the projected ground-truth constituents with the predicted ones under certain structured constraints, and calculate the average IOU score across all aligned constituent pairs.
arXiv Detail & Related papers (2024-02-21T00:01:17Z)
Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences [67.37544997614646]
We present the first study on unsupervised spoken constituency parsing. The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees. We show that accurate segmentation alone may be sufficient to parse spoken sentences accurately.
arXiv Detail & Related papers (2023-03-15T17:57:22Z)
Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal "that" in Noun Complement Clauses vs. Relative Clauses [0.0]
We investigated two methods to parse relative and noun complement clauses in English. We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency. Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that"
arXiv Detail & Related papers (2022-12-05T20:52:41Z)
Biaffine Discourse Dependency Parsing [0.0]
We use the biaffine model for neural discourse dependency parsing and achieve significant performance improvement compared with the baselines. We compare the Eisner algorithm and the Chu-Liu-Edmonds algorithm in the task and find that using the Chu-Liu-Edmonds generates deeper trees.
arXiv Detail & Related papers (2022-01-12T12:56:13Z)
A Conditional Splitting Framework for Efficient Constituency Parsing [14.548146390081778]
We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions. Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient top-down decoding. For discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing.
arXiv Detail & Related papers (2021-06-30T00:36:34Z)
Unsupervised Parsing via Constituency Tests [49.42244463346612]
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.
arXiv Detail & Related papers (2020-10-07T04:05:01Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input. On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z)
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances" Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.