Related papers: Refining Syntactic Distinctions Using Decision Trees: A Paper on Postnominal 'That' in Complement vs. Relative Clauses

Refining Syntactic Distinctions Using Decision Trees: A Paper on Postnominal 'That' in Complement vs. Relative Clauses

URL: http://arxiv.org/abs/2509.14261v1
Date: Sat, 13 Sep 2025 15:41:13 GMT
Title: Refining Syntactic Distinctions Using Decision Trees: A Paper on Postnominal 'That' in Complement vs. Relative Clauses
Authors: Hamady Gackou,
Abstract summary: We first tested the performance of the TreeTagger English model developed by Helmut Schmid with test files at our disposal.<n>We distinguished between the two uses of "that," both as a relative pronoun and as a complementizer.<n>We proposed an improved model by retraining TreeTagger and compared the newly trained model with Schmid's baseline model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this study, we first tested the performance of the TreeTagger English model developed by Helmut Schmid with test files at our disposal, using this model to analyze relative clauses and noun complement clauses in English. We distinguished between the two uses of "that," both as a relative pronoun and as a complementizer. To achieve this, we employed an algorithm to reannotate a corpus that had originally been parsed using the Universal Dependency framework with the EWT Treebank. In the next phase, we proposed an improved model by retraining TreeTagger and compared the newly trained model with Schmid's baseline model. This process allowed us to fine-tune the model's performance to more accurately capture the subtle distinctions in the use of "that" as a complementizer and as a nominal. We also examined the impact of varying the training dataset size on TreeTagger's accuracy and assessed the representativeness of the EWT Treebank files for the structures under investigation. Additionally, we analyzed some of the linguistic and structural factors influencing the ability to effectively learn this distinction.

Related papers

Fine-tuning a Subtle Parsing Distinction Using a Probabilistic Decision Tree: the Case of Postnominal "that" in Noun Complement Clauses vs. Relative Clauses [0.0]
We investigated two methods to parse relative and noun complement clauses in English. We used an algorithm to relabel a corpus parsed with the GUM Treebank using Universal Dependency. Our second experiment consisted in using TreeTagger, a Probabilistic Decision Tree, to learn the distinction between the two complement and relative uses of postnominal "that"
arXiv Detail & Related papers (2022-12-05T20:52:41Z)
A Syntax-Guided Grammatical Error Correction Model with Dependency Tree Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences. We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees. We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z)
Learning compositional structures for semantic graph parsing [81.41592892863979]
We show how AM dependency parsing can be trained directly on a neural latent-variable model. Our model picks up on several linguistic phenomena on its own and achieves comparable accuracy to supervised training.
arXiv Detail & Related papers (2021-06-08T14:20:07Z)
Combining Prediction and Interpretation in Decision Trees (PrInDT) -- a Linguistic Example [0.0]
We show that conditional inference trees and ensembles are suitable methods for modeling linguistic variation. As against earlier linguistic applications, however, we claim that their suitability is strongly increased if we combine prediction and interpretation.
arXiv Detail & Related papers (2021-03-03T11:32:20Z)
Constructing Taxonomies from Pretrained Language Models [52.53846972667636]
We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models. Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees. We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees.
arXiv Detail & Related papers (2020-10-24T07:16:21Z)
Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter [6.718611456024702]
We investigate the performance of two methods for explaining tree-based models- Tree Interpreter (TI) and SHapley Additive exPlanations TreeExplainer (SHAP-TE) We find that, although the SHAP-TE offers consistency guarantees over TI, at the cost of increased computation, consistency does not necessarily improve the explanation performance in our case study.
arXiv Detail & Related papers (2020-10-13T23:18:26Z)
Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages. We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves. We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)
Unsupervised Parsing via Constituency Tests [49.42244463346612]
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.
arXiv Detail & Related papers (2020-10-07T04:05:01Z)
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances" Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
An enhanced Tree-LSTM architecture for sentence semantic modeling using typed dependencies [0.0]
Tree-based Long short term memory (LSTM) network has become state-of-the-art for modeling the meaning of language texts. This paper proposes an enhanced LSTM architecture, called relation gated LSTM, which can model the relationship between two inputs of a sequence. We also introduce a Tree-LSTM model called Typed Dependency Tree-LSTM that uses the sentence dependency parse structure and the dependency type to embed sentence meaning into a dense vector.
arXiv Detail & Related papers (2020-02-18T18:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.