Analyzing Text Representations under Tight Annotation Budgets: Measuring
Structural Alignment
- URL: http://arxiv.org/abs/2210.05721v1
- Date: Tue, 11 Oct 2022 18:28:19 GMT
- Title: Analyzing Text Representations under Tight Annotation Budgets: Measuring
Structural Alignment
- Authors: C\'esar Gonz\'alez-Guti\'errez, Audi Primadhanty, Francesco Cazzaro,
Ariadna Quattoni
- Abstract summary: Under tight annotation budgets the choice of data representation is key.
We propose a metric that measures the extent to which a given representation is structurally aligned with a task.
- Score: 2.198430261120653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Annotating large collections of textual data can be time consuming and
expensive. That is why the ability to train models with limited annotation
budgets is of great importance. In this context, it has been shown that under
tight annotation budgets the choice of data representation is key. The goal of
this paper is to better understand why this is so. With this goal in mind, we
propose a metric that measures the extent to which a given representation is
structurally aligned with a task. We conduct experiments on several text
classification datasets testing a variety of models and representations. Using
our proposed metric we show that an efficient representation for a task (i.e.
one that enables learning from few samples) is a representation that induces a
good alignment between latent input structure and class structure.
Related papers
- IDEAL: Influence-Driven Selective Annotations Empower In-Context
Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method.
It aims to minimize annotation costs while improving the quality of in-context examples.
Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Analyzing Text Representations by Measuring Task Alignment [2.198430261120653]
We develop a task alignment score based on hierarchical clustering that measures alignment at different levels of granularity.
Our experiments on text classification validate our hypothesis by showing that task alignment can explain the classification performance of a given representation.
arXiv Detail & Related papers (2023-05-31T11:20:48Z) - Joint Representations of Text and Knowledge Graphs for Retrieval and
Evaluation [15.55971302563369]
A key feature of neural models is that they can produce semantic vector representations of objects (texts, images, speech, etc.) ensuring that similar objects are close to each other in the vector space.
While much work has focused on learning representations for other modalities, there are no aligned cross-modal representations for text and knowledge base elements.
arXiv Detail & Related papers (2023-02-28T17:39:43Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - Measuring the Interpretability of Unsupervised Representations via
Quantized Reverse Probing [97.70862116338554]
We investigate the problem of measuring interpretability of self-supervised representations.
We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts.
We use our method to evaluate a large number of self-supervised representations, ranking them by interpretability.
arXiv Detail & Related papers (2022-09-07T16:18:50Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - DirectProbe: Studying Representations without Classifiers [21.23284793831221]
DirectProbe studies the geometry of a representation by building upon the notion of a version space for a task.
Experiments with several linguistic tasks and contextualized embeddings show that, even without training classifiers, DirectProbe can shine light into how an embedding space represents labels.
arXiv Detail & Related papers (2021-04-13T02:40:26Z) - Structured (De)composable Representations Trained with Neural Networks [21.198279941828112]
A template representation refers to the generic representation that captures the characteristics of an entire class.
The proposed technique uses end-to-end deep learning to learn structured and composable representations from input images and discrete labels.
We prove that the representations have a clear structure allowing to decompose the representation into factors that represent classes and environments.
arXiv Detail & Related papers (2020-07-07T10:20:31Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.