Related papers: Phrase-Level Adversarial Training for Mitigating Bias in Neural Network-based Automatic Essay Scoring

Phrase-Level Adversarial Training for Mitigating Bias in Neural Network-based Automatic Essay Scoring

URL: http://arxiv.org/abs/2409.04795v1
Date: Sat, 7 Sep 2024 11:22:35 GMT
Title: Phrase-Level Adversarial Training for Mitigating Bias in Neural Network-based Automatic Essay Scoring
Authors: Haddad Philip, Tsegaye Misikir Tashu,
Abstract summary: We propose a model-agnostic phrase-level method to generate an adversarial essay set to address the biases and robustness of AES models. Experimental results show that the proposed approach significantly improves AES model performance in the presence of adversarial examples and scenarios.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic Essay Scoring (AES) is widely used to evaluate candidates for educational purposes. However, due to the lack of representative data, most existing AES systems are not robust, and their scoring predictions are biased towards the most represented data samples. In this study, we propose a model-agnostic phrase-level method to generate an adversarial essay set to address the biases and robustness of AES models. Specifically, we construct an attack test set comprising samples from the original test set and adversarially generated samples using our proposed method. To evaluate the effectiveness of the attack strategy and data augmentation, we conducted a comprehensive analysis utilizing various neural network scoring models. Experimental results show that the proposed approach significantly improves AES model performance in the presence of adversarial examples and scenarios without such attacks.

Related papers

Predictive Sample Assignment for Semantically Coherent Out-of-Distribution Detection [62.1052001316508]
Semantically coherent out-of-distribution detection (SCOOD) is a recently proposed realistic OOD detection setting.<n>We propose a concise SCOOD framework based on predictive sample assignment (PSA)<n>Our approach outperforms the state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2025-12-15T01:18:38Z)
A methodological analysis of prompt perturbations and their effect on attack success rates [0.5387033080274478]
This work aims to investigate how different Large Language Models (LLMs) alignment methods affect the models' responses to prompt attacks.<n>We selected open source models based on the most common alignment methods, namely, Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning with Human Feedback (RLHF)
arXiv Detail & Related papers (2025-11-11T19:39:33Z)
Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack [21.396030274654073]
We propose IMIA, a novel attack strategy that utilizes the process of generating adversarial samples to infer membership.<n>We conduct experiments across multiple models and datasets, and our results demonstrate that the number of iterations for generating an adversarial sample is a reliable feature for membership inference.
arXiv Detail & Related papers (2025-06-03T10:09:24Z)
PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks [17.584752814352502]
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data. We introduce a self-supervised, computationally economical method for generating adversarial examples. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models.
arXiv Detail & Related papers (2023-10-05T17:34:47Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)
Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks. We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z)
Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z)
Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z)
TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples. We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
Unseen Face Presentation Attack Detection Using Class-Specific Sparse One-Class Multiple Kernel Fusion Regression [15.000818334408802]
The paper addresses face presentation attack detection in the challenging conditions of an unseen attack scenario. A pure one-class face presentation attack detection approach based on kernel regression is developed.
arXiv Detail & Related papers (2019-12-31T11:53:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.