UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted
Features for Lexical Complexity Prediction
- URL: http://arxiv.org/abs/2104.06983v1
- Date: Wed, 14 Apr 2021 17:05:46 GMT
- Title: UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted
Features for Lexical Complexity Prediction
- Authors: George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu
- Abstract summary: We describe our approach for the SemEval-2021 Task 1: Lexical Complexity Prediction competition.
Our results are just 5.46% and 6.5% lower than the top scores obtained in the competition on the first and the second subtasks.
- Score: 0.7197592390105455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reading is a complex process which requires proper understanding of texts in
order to create coherent mental representations. However, comprehension
problems may arise due to hard-to-understand sections, which can prove
troublesome for readers, while accounting for their specific language skills.
As such, steps towards simplifying these sections can be performed, by
accurately identifying and evaluating difficult structures. In this paper, we
describe our approach for the SemEval-2021 Task 1: Lexical Complexity
Prediction competition that consists of a mixture of advanced NLP techniques,
namely Transformer-based language models, pre-trained word embeddings, Graph
Convolutional Networks, Capsule Networks, as well as a series of hand-crafted
textual complexity features. Our models are applicable on both subtasks and
achieve good performance results, with a MAE below 0.07 and a Person
correlation of .73 for single word identification, as well as a MAE below 0.08
and a Person correlation of .79 for multiple word targets. Our results are just
5.46% and 6.5% lower than the top scores obtained in the competition on the
first and the second subtasks, respectively.
Related papers
- SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks.
More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z) - Lexical Complexity Prediction: An Overview [13.224233182417636]
The occurrence of unknown words in texts significantly hinders reading comprehension.
computational modelling has been applied to identify complex words in texts and substitute them for simpler alternatives.
We present an overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data.
arXiv Detail & Related papers (2023-03-08T19:35:08Z) - Prompt-based Learning for Text Readability Assessment [0.4757470449749875]
We propose the novel adaptation of a pre-trained seq2seq model for readability assessment.
We prove that a seq2seq model can be adapted to discern which text is more difficult from two given texts (pairwise)
arXiv Detail & Related papers (2023-02-25T18:39:59Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for
Lexical Complexity Prediction [4.86331990243181]
This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP)
Our system uses logistic regression and a wide range of linguistic features to predict the complexity of single words in this dataset.
We evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.
arXiv Detail & Related papers (2021-05-18T18:55:04Z) - BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with
Assembly Models [2.4815579733050153]
This paper describes a system submitted by team BigGreen to 2021 for predicting the lexical complexity of English words in a given context.
We assemble a feature engineering-based model with a deep neural network model founded on BERT.
Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonological measures.
arXiv Detail & Related papers (2021-04-19T04:05:50Z) - UPB at SemEval-2021 Task 7: Adversarial Multi-Task Learning for
Detecting and Rating Humor and Offense [0.6404122934568858]
We describe our adversarial multi-task network, AMTL-Humor, used to detect and rate humor and offensive texts.
Our best model consists of an ensemble of all tested configurations, and achieves a 95.66% F1-score and 94.70% accuracy for Task 1a.
arXiv Detail & Related papers (2021-04-13T09:59:05Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.