Knowledge-Rich BERT Embeddings for Readability Assessment
- URL: http://arxiv.org/abs/2106.07935v1
- Date: Tue, 15 Jun 2021 07:37:48 GMT
- Title: Knowledge-Rich BERT Embeddings for Readability Assessment
- Authors: Joseph Marvin Imperial
- Abstract summary: We propose an alternative way of utilizing the information-rich embeddings of BERT models through a joint-learning method.
Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic readability assessment (ARA) is the task of evaluating the level of
ease or difficulty of text documents for a target audience. For researchers,
one of the many open problems in the field is to make such models trained for
the task show efficacy even for low-resource languages. In this study, we
propose an alternative way of utilizing the information-rich embeddings of BERT
models through a joint-learning method combined with handcrafted linguistic
features for readability assessment. Results show that the proposed method
outperforms classical approaches in readability assessment using English and
Filipino datasets, and obtaining as high as 12.4% increase in F1 performance.
We also show that the knowledge encoded in BERT embeddings can be used as a
substitute feature set for low-resource languages like Filipino with limited
semantic and syntactic NLP tools to explicitly extract feature values for the
task.
Related papers
- BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings [0.4194295877935868]
The choice of embeddings plays a critical role in enhancing the performance of NLP tasks.
In this study, we investigate the impact of various embedding techniques- Contextual BERT-based, Non-Contextual BERT-based, and FastText-based on NLP classification tasks specific to the Marathi language.
arXiv Detail & Related papers (2024-11-26T18:25:57Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge [54.61068946420894]
We introduce an innovative method by introducing FEature Context and TErm-level Knowledge modules.
To effectively enrich the feature context representations of term weight, the Feature Context Module (FCM) is introduced.
We also develop a term-level knowledge guidance module (TKGM) for effectively utilizing term-level knowledge to intelligently guide the modeling process of term weight.
arXiv Detail & Related papers (2024-04-18T12:58:36Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Automatic Readability Assessment for Closely Related Languages [6.233117407988574]
This work focuses on how linguistic aspects such as mutual intelligibility or degree of language relatedness can improve ARA in a low-resource setting.
We collect short stories written in three languages in the Philippines-Tagalog, Bikol, and Cebuano-to train readability assessment models.
Our results show that the inclusion of CrossNGO, a novel specialized feature exploiting n-gram overlap applied to languages with high mutual intelligibility, significantly improves the performance of ARA models.
arXiv Detail & Related papers (2023-05-22T20:42:53Z) - A Unified Neural Network Model for Readability Assessment with Feature
Projection and Length-Balanced Loss [17.213602354715956]
We propose a BERT-based model with feature projection and length-balanced loss for readability assessment.
Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks.
arXiv Detail & Related papers (2022-10-19T05:33:27Z) - BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [0.0]
BERT has revolutionized the NLP field by enabling transfer learning with large language models.
This article studies how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models.
arXiv Detail & Related papers (2022-01-10T15:05:05Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Fantastic Features and Where to Find Them: Detecting Cognitive
Impairment with a Subsequence Classification Guided Approach [6.063165888023164]
We describe a new approach to feature engineering that leverages sequential machine learning models and domain knowledge to predict which features help enhance performance.
We demonstrate that CI classification accuracy improves by 2.3% over a strong baseline when using features produced by this method.
arXiv Detail & Related papers (2020-10-13T17:57:18Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.