$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics
- URL: http://arxiv.org/abs/2407.12512v1
- Date: Wed, 17 Jul 2024 11:53:39 GMT
- Title: $\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics
- Authors: Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl,
- Abstract summary: This work formally initiates the concept of $textitclass-wise hardness$.
Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment.
$textitGeoHard$ surpasses instance-level metrics by over 59 percent on $textitPearson$'s correlation on measuring class-wise hardness.
- Score: 90.9047957137981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in measuring hardness-wise properties of data guide language models in sample selection within low-resource scenarios. However, class-specific properties are overlooked for task setup and learning. How will these properties influence model learning and is it generalizable across datasets? To answer this question, this work formally initiates the concept of $\textit{class-wise hardness}$. Experiments across eight natural language understanding (NLU) datasets demonstrate a consistent hardness distribution across learning paradigms, models, and human judgment. Subsequent experiments unveil a notable challenge in measuring such class-wise hardness with instance-level metrics in previous works. To address this, we propose $\textit{GeoHard}$ for class-wise hardness measurement by modeling class geometry in the semantic embedding space. $\textit{GeoHard}$ surpasses instance-level metrics by over 59 percent on $\textit{Pearson}$'s correlation on measuring class-wise hardness. Our analysis theoretically and empirically underscores the generality of $\textit{GeoHard}$ as a fresh perspective on data diagnosis. Additionally, we showcase how understanding class-wise hardness can practically aid in improving task learning.
Related papers
- Controllable Context Sensitivity and the Knob Behind It [53.70327066130381]
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge.
We search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge.
arXiv Detail & Related papers (2024-11-11T22:22:21Z) - How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective [17.956310574300765]
This paper introduces a novel generalized self-imitation learning ($textbfGSIL$) framework.
It effectively and efficiently aligns large language models with offline demonstration data.
$textbfGSIL$ consistently and significantly outperforms baselines in many challenging benchmarks.
arXiv Detail & Related papers (2024-10-14T02:21:29Z) - The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [84.30018805150607]
We present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data.
We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear heads, and QLoRA.
We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied.
arXiv Detail & Related papers (2024-01-12T18:36:29Z) - Statistical learning on measures: an application to persistence diagrams [0.0]
We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $mathcalX$.
We show that our framework allows more flexibility and diversity in the input data we can deal with.
While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams.
arXiv Detail & Related papers (2023-03-15T09:01:37Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Information-Theoretic Measures of Dataset Difficulty [54.538766940287864]
Estimating difficulty of a dataset typically involves comparing state-of-the-art models to humans.
We propose an information-theoretic perspective, framing dataset difficulty as the absence of usable information.
arXiv Detail & Related papers (2021-10-16T00:21:42Z) - PyHard: a novel tool for generating hardness embeddings to support
data-centric analysis [0.38233569758620045]
PyHard produces a hardness embedding of a dataset relating the predictive performance of multiple ML models.
The user can interact with this embedding in multiple ways to obtain useful insights about data and algorithmic performance.
We show in a COVID prognosis dataset how this analysis supported the identification of pockets of hard observations that challenge ML models.
arXiv Detail & Related papers (2021-09-29T14:08:26Z) - Geometry matters: Exploring language examples at the decision boundary [2.7249290070320034]
BERT, CNN and fasttext are susceptible to word substitutions in high difficulty examples.
On YelpReviewPolarity we observe a correlation coefficient of -0.4 between resilience to perturbations and the difficulty score.
Our approach is simple, architecture agnostic and can be used to study the fragilities of text classification models.
arXiv Detail & Related papers (2020-10-14T16:26:13Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.