AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses
- URL: http://arxiv.org/abs/2109.11728v2
- Date: Tue, 28 Sep 2021 17:05:19 GMT
- Title: AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses
- Authors: Yaman Singla Kumar, Swapnil Parekh, Somesh Singh, Junyi Jessy Li,
Rajiv Ratn Shah, Changyou Chen
- Abstract summary: We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
- Score: 66.49753193098356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep-learning based Automatic Essay Scoring (AES) systems are being actively
used by states and language testing agencies alike to evaluate millions of
candidates for life-changing decisions ranging from college applications to
visa approvals. However, little research has been put to understand and
interpret the black-box nature of deep-learning based scoring algorithms.
Previous studies indicate that scoring models can be easily fooled. In this
paper, we explore the reason behind their surprising adversarial brittleness.
We utilize recent advances in interpretability to find the extent to which
features such as coherence, content, vocabulary, and relevance are important
for automated scoring mechanisms. We use this to investigate the
oversensitivity i.e., large change in output score with a little change in
input essay content) and overstability i.e., little change in output scores
with large changes in input essay content) of AES. Our results indicate that
autoscoring models, despite getting trained as "end-to-end" models with rich
contextual embeddings such as BERT, behave like bag-of-words models. A few
words determine the essay score without the requirement of any context making
the model largely overstable. This is in stark contrast to recent probing
studies on pre-trained representation learning models, which show that rich
linguistic features such as parts-of-speech and morphology are encoded by them.
Further, we also find that the models have learnt dataset biases, making them
oversensitive. To deal with these issues, we propose detection-based protection
models that can detect oversensitivity and overstability causing samples with
high accuracies. We find that our proposed models are able to detect unusual
attribution patterns and flag adversarial samples successfully.
Related papers
- Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings [5.257719744958367]
This thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs)
We develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy.
Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations.
arXiv Detail & Related papers (2024-08-28T09:07:30Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Explainable Verbal Deception Detection using Transformers [1.5104201344012347]
This paper proposes and evaluates six deep-learning models, including combinations of BERT (and RoBERTa), MultiHead Attention, co-attentions, and transformers.
The findings suggest that our transformer-based models can enhance automated deception detection performances (+2.11% in accuracy)
arXiv Detail & Related papers (2022-10-06T17:36:00Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.