StaICC: Standardized Evaluation for Classification Task in In-context Learning
- URL: http://arxiv.org/abs/2501.15708v2
- Date: Sat, 01 Feb 2025 14:45:09 GMT
- Title: StaICC: Standardized Evaluation for Classification Task in In-context Learning
- Authors: Hakaze Cho, Naoya Inoue,
- Abstract summary: This paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification.
For the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form.
We also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
- Score: 3.0531121420837226
- License:
- Abstract: Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
Related papers
- MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation [46.50551811108464]
We present a benchmark with spurious-correlation shifts collected from real-world scenarios.
We also propose a metric by using CLIP as a pre-trained vision-language model.
The experimental results show that the performance of the existing methods degrades significantly in the presence of spurious-correlation shifts.
arXiv Detail & Related papers (2024-04-30T15:45:30Z) - Exploring Hierarchical Classification Performance for Time Series Data:
Dissimilarity Measures and Classifier Comparisons [0.0]
This study investigates the comparative performance of hierarchical classification (HC) and flat classification (FC) methodologies in time series data analysis.
Dissimilarity measures, including Jensen-Shannon Distance (JSD), Task Similarity Distance (TSD), and Based Distance (CBD) are leveraged.
arXiv Detail & Related papers (2024-02-07T21:46:26Z) - XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series
Classification [0.0]
This paper proposes XTSC-Bench, a benchmarking tool for evaluating TSC explainability methods.
We analyze 3 perturbation-, 6 gradient- and 2 example-based explanation methods to TSC showing that improvements in the explainers' robustness and reliability are necessary.
arXiv Detail & Related papers (2023-10-23T14:00:02Z) - Mitigating Catastrophic Forgetting in Task-Incremental Continual
Learning with Adaptive Classification Criterion [50.03041373044267]
We propose a Supervised Contrastive learning framework with adaptive classification criterion for Continual Learning.
Experiments show that CFL achieves state-of-the-art performance and has a stronger ability to overcome compared with the classification baselines.
arXiv Detail & Related papers (2023-05-20T19:22:40Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Complementary Labels Learning with Augmented Classes [22.460256396941528]
Complementary Labels Learning (CLL) arises in many real-world tasks such as private questions classification and online learning.
We propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC)
By using unlabeled data, we propose an unbiased estimator of classification risk for CLLAC, which is guaranteed to be provably consistent.
arXiv Detail & Related papers (2022-11-19T13:55:27Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - Personalized Benchmarking with the Ludwig Benchmarking Toolkit [12.347185532330919]
Ludwig Benchmarking Toolkit (LBT) is a personalized benchmarking toolkit for running end-to-end benchmark studies.
LBT provides an interface for controlling training and customizing evaluation, a standardized training framework for eliminating confounding variables, and support for multi-objective evaluation.
We show how LBT can be used to create personalized benchmark studies with a large-scale comparative analysis for text classification across 7 models and 9 datasets.
arXiv Detail & Related papers (2021-11-08T03:53:38Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.