Hardness of Samples Need to be Quantified for a Reliable Evaluation
System: Exploring Potential Opportunities with a New Task
- URL: http://arxiv.org/abs/2210.07631v1
- Date: Fri, 14 Oct 2022 08:26:32 GMT
- Title: Hardness of Samples Need to be Quantified for a Reliable Evaluation
System: Exploring Potential Opportunities with a New Task
- Authors: Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral
- Abstract summary: Evaluation of models on benchmarks is unreliable without knowing the degree of sample hardness.
We propose a Data Scoring task that requires assignment of each unannotated sample in a benchmark a score between 0 to 1.
- Score: 24.6240575061124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation of models on benchmarks is unreliable without knowing the degree
of sample hardness; this subsequently overestimates the capability of AI
systems and limits their adoption in real world applications. We propose a Data
Scoring task that requires assignment of each unannotated sample in a benchmark
a score between 0 to 1, where 0 signifies easy and 1 signifies hard. Use of
unannotated samples in our task design is inspired from humans who can
determine a question difficulty without knowing its correct answer. This also
rules out the use of methods involving model based supervision (since they
require sample annotations to get trained), eliminating potential biases
associated with models in deciding sample difficulty. We propose a method based
on Semantic Textual Similarity (STS) for this task; we validate our method by
showing that existing models are more accurate with respect to the easier
sample-chunks than with respect to the harder sample-chunks. Finally we
demonstrate five novel applications.
Related papers
- Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point
Clouds [46.33828400918886]
We show how we can train an almost-universal meta-sampler across multiple tasks.
This meta-sampler can then be rapidly fine-tuned when applied to different datasets, networks, or even different tasks.
arXiv Detail & Related papers (2022-03-30T02:21:34Z) - Non-generative Generalized Zero-shot Learning via Task-correlated
Disentanglement and Controllable Samples Synthesis [20.34562156468408]
We propose a non-generative model to address these problems.
In addition, we formulate a new ZSL task named the 'Few-shot Seen class and Zero-shot Unseen class learning' (FSZU)
arXiv Detail & Related papers (2022-03-10T12:32:26Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Density-Based Dynamic Curriculum Learning for Intent Detection [14.653917644725427]
Our model defines the sample's difficulty level according to their eigenvectors' density.
We apply a dynamic curriculum learning strategy, which pays distinct attention to samples of various difficulty levels.
Experiments on three open datasets verify that the proposed density-based algorithm can distinguish simple and complex samples significantly.
arXiv Detail & Related papers (2021-08-24T12:29:26Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Identifying Wrongly Predicted Samples: A Method for Active Learning [6.976600214375139]
We propose a simple sample selection criterion that moves beyond uncertainty.
We show state-of-the-art results and better rates at identifying wrongly predicted samples.
arXiv Detail & Related papers (2020-10-14T09:00:42Z) - A Provably Efficient Sample Collection Strategy for Reinforcement
Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.
We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.