Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification
- URL: http://arxiv.org/abs/2304.07774v1
- Date: Sun, 16 Apr 2023 13:13:58 GMT
- Title: Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification
- Authors: Muhammad Salman, Armin Haller, Sergio J. Rodr\'iguez M\'endez
- Abstract summary: We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences.
The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity.
This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Text simplification is one of the domains in Natural Language Processing
(NLP) that offers an opportunity to understand the text in a simplified manner
for exploration. However, it is always hard to understand and retrieve
knowledge from unstructured text, which is usually in the form of compound and
complex sentences. There are state-of-the-art neural network-based methods to
simplify the sentences for improved readability while replacing words with
plain English substitutes and summarising the sentences and paragraphs. In the
Knowledge Graph (KG) creation process from unstructured text, summarising long
sentences and substituting words is undesirable since this may lead to
information loss. However, KG creation from text requires the extraction of all
possible facts (triples) with the same mentions as in the text. In this work,
we propose a controlled simplification based on the factual information in a
sentence, i.e., triple. We present a classical syntactic dependency-based
approach to split and rephrase a compound and complex sentence into a set of
simplified sentences. This simplification process will retain the original
wording with a simple structure of possible domain facts in each sentence,
i.e., triples. The paper also introduces an algorithm to identify and measure a
sentence's syntactic complexity (SC), followed by reduction through a
controlled syntactic simplification process. Last, an experiment for a dataset
re-annotation is also conducted through GPT3; we aim to publish this refined
corpus as a resource. This work is accepted and presented in International
workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference. The
code and data is available at www.github.com/sallmanm/SynSim.
Related papers
- Discourse-Aware Text Simplification: From Complex Sentences to Linked
Propositions [11.335080241393191]
Text Simplification (TS) aims to modify sentences in order to make them easier to process.
We present a discourse-aware TS approach that splits and rephrases complex English sentences.
We generate a semantic hierarchy of minimal propositions that puts a semantic layer on top of the simplified sentences.
arXiv Detail & Related papers (2023-08-01T10:10:59Z) - Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework.
We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z) - SimpLex: a lexical text simplification architecture [0.5156484100374059]
We present textscSimpLex, a novel simplification architecture for generating simplified English sentences.
The proposed architecture uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity.
The solution is incorporated into a user-friendly and simple-to-use software.
arXiv Detail & Related papers (2023-04-14T08:52:31Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Text Simplification for Comprehension-based Question-Answering [7.144235435987265]
We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset.
We benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task.
arXiv Detail & Related papers (2021-09-28T18:48:00Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Explainable Prediction of Text Complexity: The Missing Preliminaries for
Text Simplification [13.447565774887215]
Text simplification reduces the language complexity of professional content for accessibility purposes.
End-to-end neural network models have been widely adopted to directly generate the simplified version of input text.
We show that text simplification can be decomposed into a compact pipeline of tasks to ensure the transparency and explainability of the process.
arXiv Detail & Related papers (2020-07-31T03:33:37Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - CompLex: A New Corpus for Lexical Complexity Prediction from Likert
Scale Data [13.224233182417636]
This paper presents the first English dataset for continuous lexical complexity prediction.
We use a 5-point Likert scale scheme to annotate complex words in texts from three sources/domains: the Bible, Europarl, and biomedical texts.
arXiv Detail & Related papers (2020-03-16T03:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.