Text Complexity Classification Based on Linguistic Information:
Application to Intelligent Tutoring of ESL
- URL: http://arxiv.org/abs/2001.01863v7
- Date: Wed, 29 Jul 2020 14:33:08 GMT
- Title: Text Complexity Classification Based on Linguistic Information:
Application to Intelligent Tutoring of ESL
- Authors: M. Zakaria Kurdi
- Abstract summary: The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language ( ESL) learners.
Using a corpus of 6171 texts, which had already been classified into three different levels of difficulty by ESL experts, different experiments were conducted with five machine learning algorithms.
The results showed that the adopted linguistic features provide a good overall classification performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this work is to build a classifier that can identify text
complexity within the context of teaching reading to English as a Second
Language (ESL) learners. To present language learners with texts that are
suitable to their level of English, a set of features that can describe the
phonological, morphological, lexical, syntactic, discursive, and psychological
complexity of a given text were identified. Using a corpus of 6171 texts, which
had already been classified into three different levels of difficulty by ESL
experts, different experiments were conducted with five machine learning
algorithms. The results showed that the adopted linguistic features provide a
good overall classification performance (F-Score = 0.97). A scalability
evaluation was conducted to test if such a classifier could be used within real
applications, where it can be, for example, plugged into a search engine or a
web-scraping module. In this evaluation, the texts in the test set are not only
different from those from the training set but also of different types (ESL
texts vs. children reading texts). Although the overall performance of the
classifier decreased significantly (F-Score = 0.65), the confusion matrix shows
that most of the classification errors are between the classes two and three
(the middle-level classes) and that the system has a robust performance in
categorizing texts of class one and four. This behavior can be explained by the
difference in classification criteria between the two corpora. Hence, the
observed results confirm the usability of such a classifier within a real-world
application.
Related papers
- Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment [4.2936749846785345]
Toxicity classification for voice heavily relies on semantic content of speech.
We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier.
arXiv Detail & Related papers (2024-06-14T17:56:53Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Large Language Models Are Zero-Shot Text Classifiers [3.617781755808837]
Large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP)
In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes.
With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts.
arXiv Detail & Related papers (2023-12-02T06:33:23Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Generalized Funnelling: Ensemble Learning and Heterogeneous Document
Embeddings for Cross-Lingual Text Classification [78.83284164605473]
emphFunnelling (Fun) is a recently proposed method for cross-lingual text classification.
We describe emphGeneralized Funnelling (gFun) as a generalization of Fun.
We show that gFun substantially improves over Fun and over state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-17T23:33:04Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Efficient Measuring of Readability to Improve Documents Accessibility
for Arabic Language Learners [0.0]
The approach is based on machine learning classification methods to discriminate between different levels of difficulty in reading and understanding a text.
Several models were trained on a large corpus mined from online Arabic websites and manually annotated.
Best results were achieved using TF-IDF Vectors trained by a combination of word-based unigrams and bigrams with an overall accuracy of 87.14% over four classes of complexity.
arXiv Detail & Related papers (2021-09-09T10:05:38Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.