Trusting Language Models in Education
- URL: http://arxiv.org/abs/2308.03866v1
- Date: Mon, 7 Aug 2023 18:27:54 GMT
- Title: Trusting Language Models in Education
- Authors: Jogi Suda Neto, Li Deng, Thejaswi Raya, Reza Shahbazi, Nick Liu,
Adhitya Venkatesh, Miral Shah, Neeru Khosla, Rodrigo Capobianco Guido
- Abstract summary: We propose to use an XGBoost on top of BERT to output the corrected probabilities.
Our hypothesis is that the level of uncertainty contained in the flow of attention is related to the quality of the model's response itself.
- Score: 1.2578554943276923
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Language Models are being widely used in Education. Even though modern deep
learning models achieve very good performance on question-answering tasks,
sometimes they make errors. To avoid misleading students by showing wrong
answers, it is important to calibrate the confidence - that is, the prediction
probability - of these models. In our work, we propose to use an XGBoost on top
of BERT to output the corrected probabilities, using features based on the
attention mechanism. Our hypothesis is that the level of uncertainty contained
in the flow of attention is related to the quality of the model's response
itself.
Related papers
- Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Beyond Confidence: Reliable Models Should Also Consider Atypicality [43.012818086415514]
We investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions.
We show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy.
We propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance.
arXiv Detail & Related papers (2023-05-29T17:37:09Z) - Do Not Trust a Model Because It is Confident: Uncovering and
Characterizing Unknown Unknowns to Student Success Predictors in Online-Based
Learning [10.120425915106727]
Student success models might be prone to develop weak spots, i.e., examples hard to accurately classify.
This weakness is one of the main factors undermining users' trust, since model predictions could for instance lead an instructor to not intervene on a student in need.
In this paper, we unveil the need of detecting and characterizing unknown unknowns in student success prediction.
arXiv Detail & Related papers (2022-12-16T15:32:49Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.