Automatic Lexical Simplification for Turkish
- URL: http://arxiv.org/abs/2201.05878v3
- Date: Fri, 28 Jul 2023 13:33:45 GMT
- Title: Automatic Lexical Simplification for Turkish
- Authors: Ahmet Yavuz Uluslu
- Abstract summary: We present the first automatic lexical simplification system for the Turkish language.
Recent text simplification efforts rely on manually crafted simplified corpora and comprehensive NLP tools.
We present a new text simplification pipeline based on pretrained representation model BERT together with morphological features to generate grammatically correct and semantically appropriate word-level simplifications.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present the first automatic lexical simplification system
for the Turkish language. Recent text simplification efforts rely on manually
crafted simplified corpora and comprehensive NLP tools that can analyse the
target text both in word and sentence levels. Turkish is a morphologically rich
agglutinative language that requires unique considerations such as the proper
handling of inflectional cases. Being a low-resource language in terms of
available resources and industrial-strength tools, it makes the text
simplification task harder to approach. We present a new text simplification
pipeline based on pretrained representation model BERT together with
morphological features to generate grammatically correct and semantically
appropriate word-level simplifications.
Related papers
- ARTIST: ARTificial Intelligence for Simplified Text [5.095775294664102]
Text Simplification is a key Natural Language Processing task that aims for reducing the linguistic complexity of a text.
Recent advances in Generative Artificial Intelligence (AI) have enabled automatic text simplification both on the lexical and syntactical levels.
arXiv Detail & Related papers (2023-08-25T16:06:06Z) - A New Dataset and Empirical Study for Sentence Simplification in Chinese [50.0624778757462]
This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
arXiv Detail & Related papers (2023-06-07T06:47:34Z) - Exploring Hybrid Linguistic Features for Turkish Text Readability [0.0]
This paper presents the first comprehensive study on automatic readability assessment of Turkish texts.
We combine state-of-the-art neural network models with linguistic features at lexical, morphosyntactic, syntactic and discourse levels to develop an advanced readability tool.
arXiv Detail & Related papers (2023-06-06T15:32:22Z) - Teaching the Pre-trained Model to Generate Simple Texts for Text
Simplification [59.625179404482594]
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts.
We propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts.
arXiv Detail & Related papers (2023-05-21T14:03:49Z) - Elaborative Simplification as Implicit Questions Under Discussion [51.17933943734872]
This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework.
We show that explicitly modeling QUD provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse.
arXiv Detail & Related papers (2023-05-17T17:26:16Z) - Unsupervised Sentence Simplification via Dependency Parsing [4.337513096197002]
We propose a simple yet novel unsupervised sentence simplification system.
It harnesses parsing structures together with sentence embeddings to produce linguistically effective simplifications.
We establish the unsupervised state-of-the-art at 39.13 SARI on TurkCorpus set and perform competitively against supervised baselines on various quality metrics.
arXiv Detail & Related papers (2022-06-10T07:55:25Z) - SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words [59.142185753887645]
In this work, we propose a continued pre-training method for text simplification.
We use a small-scale simple text dataset for continued pre-training and employ two methods to identify simple words.
We obtain SimpleBERT, which surpasses BERT in both lexical simplification and sentence simplification tasks.
arXiv Detail & Related papers (2022-04-16T11:28:01Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Elaborative Simplification: Content Addition and Explanation Generation
in Text Simplification [33.08519864889526]
We present the first data-driven study of content addition in text simplification.
We analyze how entities, ideas, and concepts are elaborated through the lens of contextual specificity.
Our results illustrate the complexities of elaborative simplification, suggesting many interesting directions for future work.
arXiv Detail & Related papers (2020-10-20T05:06:23Z) - Chinese Lexical Simplification [29.464388721085548]
There is no research work for Chinese lexical simplification ( CLS) task.
To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS.
We present five different types of methods as baselines to generate substitute candidates for the complex word.
arXiv Detail & Related papers (2020-10-14T12:55:36Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.