Quantifying the Task-Specific Information in Text-Based Classifications
- URL: http://arxiv.org/abs/2110.08931v1
- Date: Sun, 17 Oct 2021 21:54:38 GMT
- Title: Quantifying the Task-Specific Information in Text-Based Classifications
- Authors: Zining Zhu, Aparna Balagopalan, Marzyeh Ghassemi, Frank Rudzicz
- Abstract summary: Shortcuts in datasets do not contribute to the *task-specific information* (TSI) of the classification tasks.
In this paper, we consider how much task-specific information is required to classify a dataset.
This framework allows us to compare across datasets, saying that, apart from a set of shortcut features'', classifying each sample in the Multi-NLI task involves around 0.4 nats more TSI than in the Quora Question Pair.
- Score: 20.148222318025528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, neural natural language models have attained state-of-the-art
performance on a wide variety of tasks, but the high performance can result
from superficial, surface-level cues (Bender and Koller, 2020; Niven and Kao,
2020). These surface cues, as the ``shortcuts'' inherent in the datasets, do
not contribute to the *task-specific information* (TSI) of the classification
tasks. While it is essential to look at the model performance, it is also
important to understand the datasets. In this paper, we consider this question:
Apart from the information introduced by the shortcut features, how much
task-specific information is required to classify a dataset? We formulate this
quantity in an information-theoretic framework. While this quantity is hard to
compute, we approximate it with a fast and stable method. TSI quantifies the
amount of linguistic knowledge modulo a set of predefined shortcuts -- that
contributes to classifying a sample from each dataset. This framework allows us
to compare across datasets, saying that, apart from a set of ``shortcut
features'', classifying each sample in the Multi-NLI task involves around 0.4
nats more TSI than in the Quora Question Pair.
Related papers
- Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks [10.556477506959888]
Existing methods often encounter difficulties in drawing accurate class prototypes from support set samples.
Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources.
We propose a novel solution by adequately leveraging the information within the task itself.
arXiv Detail & Related papers (2024-10-14T12:47:11Z) - Wiki-TabNER:Advancing Table Interpretation Through Named Entity
Recognition [19.423556742293762]
We analyse a widely used benchmark dataset for evaluation of TI tasks.
To overcome this drawback, we construct and annotate a new more challenging dataset.
We propose a prompting framework for evaluating the newly developed large language models.
arXiv Detail & Related papers (2024-03-07T15:22:07Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - Deep Sequence Models for Text Classification Tasks [0.007329200485567826]
Natural Language Processing (NLP) is equipping machines to understand human diverse and complicated languages.
Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection.
Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies.
Results generated were excellent with most of the models performing within the range of 80% and 94%.
arXiv Detail & Related papers (2022-07-18T18:47:18Z) - SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
on Natural Speech [44.68649535280397]
We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
arXiv Detail & Related papers (2021-11-19T18:59:23Z) - STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP)
Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains.
For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z) - Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task
Generalization in Few-shot Learning [1.0062040918634414]
Few-shot learning algorithms are designed to generalize well to new tasks with limited data.
We introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data.
arXiv Detail & Related papers (2020-09-23T17:01:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.