A Comparative Study of Feature Types for Age-Based Text Classification
- URL: http://arxiv.org/abs/2009.11898v1
- Date: Thu, 24 Sep 2020 18:41:10 GMT
- Title: A Comparative Study of Feature Types for Age-Based Text Classification
- Authors: Anna Glazkova, Yury Egorov, Maksim Glazkov
- Abstract summary: We compare the effectiveness of various types of linguistic features for the task of age-based classification of fiction texts.
The results obtained show that the features describing the text at the document level can significantly increase the quality of machine learning models.
- Score: 3.867363075280544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to automatically determine the age audience of a novel provides
many opportunities for the development of information retrieval tools. Firstly,
developers of book recommendation systems and electronic libraries may be
interested in filtering texts by the age of the most likely readers. Further,
parents may want to select literature for children. Finally, it will be useful
for writers and publishers to determine which features influence whether the
texts are suitable for children. In this article, we compare the empirical
effectiveness of various types of linguistic features for the task of age-based
classification of fiction texts. For this purpose, we collected a text corpus
of book previews labeled with one of two categories -- children's or adult. We
evaluated the following types of features: readability indices, sentiment,
lexical, grammatical and general features, and publishing attributes. The
results obtained show that the features describing the text at the document
level can significantly increase the quality of machine learning models.
Related papers
- BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation.
We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses.
Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z) - What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - TextAge: A Curated and Diverse Text Dataset for Age Classification [1.4843200329335289]
Age-related language patterns play a crucial role in understanding linguistic differences and developing age-appropriate communication strategies.
We present TextAge, a curated text dataset that maps sentences to the age and age group of the producer.
The dataset undergoes extensive cleaning and preprocessing to ensure data quality and consistency.
arXiv Detail & Related papers (2024-05-02T23:37:03Z) - Textual Stylistic Variation: Choices, Genres and Individuals [0.8057441774248633]
This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections.
This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice.
arXiv Detail & Related papers (2022-05-01T16:39:49Z) - A Survey on Retrieval-Augmented Text Generation [53.04991859796971]
Retrieval-augmented text generation has remarkable advantages and has achieved state-of-the-art performance in many NLP tasks.
It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks.
arXiv Detail & Related papers (2022-02-02T16:18:41Z) - Latin writing styles analysis with Machine Learning: New approach to old
questions [0.0]
In the Middle Ages texts were learned by heart and spread using oral means of communication from generation to generation.
Taking into account such a specific construction of literature composed in Latin, we can search for and indicate the probability patterns of familiar sources of specific narrative texts.
arXiv Detail & Related papers (2021-09-01T20:21:45Z) - Readability Research: An Interdisciplinary Approach [62.03595526230364]
We aim to provide a firm foundation for readability research, a comprehensive framework for readability research.
Readability refers to aspects of visual information design which impact information flow from the page to the reader.
These aspects can be modified on-demand, instantly improving the ease with which a reader can process and derive meaning from text.
arXiv Detail & Related papers (2021-07-20T16:52:17Z) - Using Machine Learning and Natural Language Processing Techniques to
Analyze and Support Moderation of Student Book Discussions [0.0]
The IMapBook project aims at improving the literacy and reading comprehension skills of elementary school-aged children by presenting them with interactive e-books and letting them take part in moderated book discussions.
This study aims to develop and illustrate a machine learning-based approach to message classification that could be used to automatically notify the discussion moderator of a possible need for an intervention and also to collect other useful information about the ongoing discussion.
arXiv Detail & Related papers (2020-11-23T20:33:09Z) - Quasi Error-free Text Classification and Authorship Recognition in a
large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features.
Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z) - A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language.
Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.