Comparing Text Representations: A Theory-Driven Approach
- URL: http://arxiv.org/abs/2109.07458v1
- Date: Wed, 15 Sep 2021 17:48:19 GMT
- Title: Comparing Text Representations: A Theory-Driven Approach
- Authors: Gregory Yauney, David Mimno
- Abstract summary: We adapt general tools from computational learning theory to fit the specific characteristics of text datasets.
We present a method to evaluate the compatibility between representations and tasks.
This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task.
- Score: 2.893558866535708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of the progress in contemporary NLP has come from learning
representations, such as masked language model (MLM) contextual embeddings,
that turn challenging problems into simple classification tasks. But how do we
quantify and explain this effect? We adapt general tools from computational
learning theory to fit the specific characteristics of text datasets and
present a method to evaluate the compatibility between representations and
tasks. Even though many tasks can be easily solved with simple bag-of-words
(BOW) representations, BOW does poorly on hard natural language inference
tasks. For one such task we find that BOW cannot distinguish between real and
randomized labelings, while pre-trained MLM representations show 72x greater
distinction between real and random labelings than BOW. This method provides a
calibrated, quantitative measure of the difficulty of a classification-based
NLP task, enabling comparisons between representations without requiring
empirical evaluations that may be sensitive to initializations and
hyperparameters. The method provides a fresh perspective on the patterns in a
dataset and the alignment of those patterns with specific labels.
Related papers
- Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models [18.03259038587496]
In-context learning can be significantly influenced by the order of in-context demonstration examples.
We introduce an unsupervised fine-tuning method, termed the Information-Augmented and Consistency-Enhanced approach.
Our proposed method can reduce the sensitivity of CausalLMs to the order of in-context examples and exhibit robust generalizability.
arXiv Detail & Related papers (2024-02-23T22:39:12Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - LabelPrompt: Effective Prompt-based Learning for Relation Classification [31.291466190218912]
This paper presents a novel prompt-based learning method, namely LabelPrompt, for the relation classification task.
Motivated by the intuition to GIVE MODEL CHOICES!'', we first define additional tokens to represent relation labels, which regard these tokens as the verbaliser with semantic initialisation.
Then, to mitigate inconsistency between predicted relations and given entities, we implement an entity-aware module with contrastive learning.
arXiv Detail & Related papers (2023-02-16T04:06:25Z) - Improving Multi-task Generalization Ability for Neural Text Matching via
Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks.
We adopt a specialization-generalization training strategy and refer to it as Match-Prompt.
In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens.
In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.