ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer
Assessments
- URL: http://arxiv.org/abs/2110.03895v1
- Date: Fri, 8 Oct 2021 05:13:41 GMT
- Title: ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer
Assessments
- Authors: Qinjin Jia, Jialin Cui, Yunkai Xiao, Chengyuan Liu, Parvez Rashid,
Edward F. Gehringer
- Abstract summary: This paper presents two MTL models for evaluating peer-review comments by leveraging the state-of-the-art pre-trained language representation models BERT and DistilBERT.
Our results demonstrate that BERT-based models significantly outperform previous GloVe-based methods by around 6% in F1-score on tasks of detecting a single feature.
- Score: 2.544539499281093
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Peer assessment has been widely applied across diverse academic fields over
the last few decades and has demonstrated its effectiveness. However, the
advantages of peer assessment can only be achieved with high-quality peer
reviews. Previous studies have found that high-quality review comments usually
comprise several features (e.g., contain suggestions, mention problems, use a
positive tone). Thus, researchers have attempted to evaluate peer-review
comments by detecting different features using various machine learning and
deep learning models. However, there is no single study that investigates using
a multi-task learning (MTL) model to detect multiple features simultaneously.
This paper presents two MTL models for evaluating peer-review comments by
leveraging the state-of-the-art pre-trained language representation models BERT
and DistilBERT. Our results demonstrate that BERT-based models significantly
outperform previous GloVe-based methods by around 6% in F1-score on tasks of
detecting a single feature, and MTL further improves performance while reducing
model size.
Related papers
- Multi-Perspective Stance Detection [2.8073184910275293]
Multi-perspective approach yields better classification performance than the baseline which uses the single label.
This entails that designing more inclusive perspective-aware AI models is not only an essential first step in implementing responsible and ethical AI, but it can also achieve superior results than using the traditional approaches.
arXiv Detail & Related papers (2024-11-13T16:30:41Z) - Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models [7.632217365130212]
Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks.
These models can produce hallucinations, particularly in domains with incomplete knowledge.
We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
arXiv Detail & Related papers (2024-08-22T12:04:04Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Efficient argument classification with compact language models and ChatGPT-4 refinements [0.0]
This paper presents comparative studies between a few deep learning-based models in argument mining.
The main novelty of this paper is the ensemble model which is based on BERT architecture and ChatGPT-4 as fine tuning model.
The presented results show that BERT+ChatGPT-4 outperforms the rest of the models including other Transformer-based and LSTM-based models.
arXiv Detail & Related papers (2024-03-20T16:24:10Z) - Leveraging Biases in Large Language Models: "bias-kNN'' for Effective
Few-Shot Learning [36.739829839357995]
This study introduces a novel methodology named bias-kNN''
This approach capitalizes on the biased outputs, harnessing them as primary features for kNN and supplementing with gold labels.
Our comprehensive evaluations, spanning diverse domain text classification datasets and different GPT-2 model sizes, indicate the adaptability and efficacy of the bias-kNN'' method.
arXiv Detail & Related papers (2024-01-18T08:05:45Z) - Accounting for multiplicity in machine learning benchmark performance [0.0]
Using the highest-ranked performance as an estimate for state-of-the-art (SOTA) performance is a biased estimator, giving overly optimistic results.
In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided.
arXiv Detail & Related papers (2023-03-10T10:32:18Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.