Diverse Linguistic Features for Assessing Reading Difficulty of
Educational Filipino Texts
- URL: http://arxiv.org/abs/2108.00241v1
- Date: Sat, 31 Jul 2021 13:59:46 GMT
- Title: Diverse Linguistic Features for Assessing Reading Difficulty of
Educational Filipino Texts
- Authors: Joseph Marvin Imperial, Ethel Ong
- Abstract summary: This paper describes the development of automatic machine learning-based readability assessment models for educational Filipino texts.
Results show that using a Random Forest model obtained a high performance of 62.7% in terms of accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to ensure quality and effective learning, fluency, and
comprehension, the proper identification of the difficulty levels of reading
materials should be observed. In this paper, we describe the development of
automatic machine learning-based readability assessment models for educational
Filipino texts using the most diverse set of linguistic features for the
language. Results show that using a Random Forest model obtained a high
performance of 62.7% in terms of accuracy, and 66.1% when using the optimal
combination of feature sets consisting of traditional and syllable
pattern-based predictors.
Related papers
- Strategies for Arabic Readability Modeling [9.976720880041688]
Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility.
We present a set of experimental results on Arabic readability assessment using a diverse range of approaches.
arXiv Detail & Related papers (2024-07-03T11:54:11Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - A Unified Neural Network Model for Readability Assessment with Feature
Projection and Length-Balanced Loss [17.213602354715956]
We propose a BERT-based model with feature projection and length-balanced loss for readability assessment.
Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks.
arXiv Detail & Related papers (2022-10-19T05:33:27Z) - Learning to Decompose Visual Features with Latent Textual Prompts [140.2117637223449]
We propose Decomposed Feature Prompting (DeFo) to improve vision-language models.
Our empirical study shows DeFo's significance in improving the vision-language models.
arXiv Detail & Related papers (2022-10-09T15:40:13Z) - Learning Syntactic Dense Embedding with Correlation Graph for Automatic
Readability Assessment [17.882688516249058]
We propose to incorporate linguistic features into neural network models by learning syntactic dense embeddings based on linguistic features.
Our proposed methodology can complement BERT-only model to achieve significantly better performances for automatic readability assessment.
arXiv Detail & Related papers (2021-07-09T07:26:17Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Application of Lexical Features Towards Improvement of Filipino
Readability Identification of Children's Literature [0.0]
We explore the use of lexical features towards improving readability identification of children's books written in Filipino.
Results show that combining lexical features (LEX) consisting of type-token ratio, lexical density, lexical variation, foreign word count with traditional features (TRAD) increased the performance of readability models by almost a 5% margin.
arXiv Detail & Related papers (2021-01-22T19:54:37Z) - Morphologically Aware Word-Level Translation [82.59379608647147]
We propose a novel morphologically aware probability model for bilingual lexicon induction.
Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning.
arXiv Detail & Related papers (2020-11-15T17:54:49Z) - Linguistic Features for Readability Assessment [0.0]
It is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further.
We find that, given sufficient training data, augmenting deep learning models with linguistically motivated features does not improve state-of-the-art performance.
Our results provide preliminary evidence for the hypothesis that the state-of-the-art deep learning models represent linguistic features of the text related to readability.
arXiv Detail & Related papers (2020-05-30T22:14:46Z) - Learning to Learn Morphological Inflection for Resource-Poor Languages [105.11499402984482]
We propose to cast the task of morphological inflection - mapping a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem.
Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters.
Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines.
arXiv Detail & Related papers (2020-04-28T05:13:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.