Exploring Hybrid Linguistic Features for Turkish Text Readability
- URL: http://arxiv.org/abs/2306.03774v3
- Date: Sat, 4 Nov 2023 13:03:35 GMT
- Title: Exploring Hybrid Linguistic Features for Turkish Text Readability
- Authors: Ahmet Yavuz Uluslu and Gerold Schneider
- Abstract summary: This paper presents the first comprehensive study on automatic readability assessment of Turkish texts.
We combine state-of-the-art neural network models with linguistic features at lexical, morphosyntactic, syntactic and discourse levels to develop an advanced readability tool.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the first comprehensive study on automatic readability
assessment of Turkish texts. We combine state-of-the-art neural network models
with linguistic features at lexical, morphosyntactic, syntactic and discourse
levels to develop an advanced readability tool. We evaluate the effectiveness
of traditional readability formulas compared to modern automated methods and
identify key linguistic features that determine the readability of Turkish
texts.
Related papers
- Automating Easy Read Text Segmentation [2.7309692684728617]
Easy Read text is one of the main forms of access to information for people with reading difficulties.
One of the key characteristics of this type of text is the requirement to split sentences into smaller grammatical segments.
We study novel methods for the task, leveraging masked and generative language models, along with constituent parsing.
arXiv Detail & Related papers (2024-06-17T12:25:25Z) - SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
Representations [51.08119762844217]
SenteCon is a method for introducing human interpretability in deep language representations.
We show that SenteCon provides high-level interpretability at little to no cost to predictive performance on downstream tasks.
arXiv Detail & Related papers (2023-05-24T05:06:28Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - A Transfer Learning Based Model for Text Readability Assessment in
German [4.550811027560416]
We propose a new model for text complexity assessment for German text based on transfer learning.
Best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.
arXiv Detail & Related papers (2022-07-13T15:15:44Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - Automatic Lexical Simplification for Turkish [0.0]
We present the first automatic lexical simplification system for the Turkish language.
Recent text simplification efforts rely on manually crafted simplified corpora and comprehensive NLP tools.
We present a new text simplification pipeline based on pretrained representation model BERT together with morphological features to generate grammatically correct and semantically appropriate word-level simplifications.
arXiv Detail & Related papers (2022-01-15T15:58:44Z) - Learning Syntactic Dense Embedding with Correlation Graph for Automatic
Readability Assessment [17.882688516249058]
We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features.
Our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.
arXiv Detail & Related papers (2021-07-09T07:26:17Z) - Evaluating the Morphosyntactic Well-formedness of Generated Texts [88.20502652494521]
We propose L'AMBRE -- a metric to evaluate the morphosyntactic well-formedness of text.
We show the effectiveness of our metric on the task of machine translation through a diachronic study of systems translating into morphologically-rich languages.
arXiv Detail & Related papers (2021-03-30T18:02:58Z) - Morphologically Aware Word-Level Translation [82.59379608647147]
We propose a novel morphologically aware probability model for bilingual lexicon induction.
Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning.
arXiv Detail & Related papers (2020-11-15T17:54:49Z) - Linguistic Features for Readability Assessment [0.0]
It is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further.
We find that, given sufficient training data, augmenting deep learning models with linguistically motivated features does not improve state-of-the-art performance.
Our results provide preliminary evidence for the hypothesis that the state-of-the-art deep learning models represent linguistic features of the text related to readability.
arXiv Detail & Related papers (2020-05-30T22:14:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.