Pareto Probing: Trading Off Accuracy for Complexity
- URL: http://arxiv.org/abs/2010.02180v3
- Date: Mon, 4 Dec 2023 12:23:52 GMT
- Title: Pareto Probing: Trading Off Accuracy for Complexity
- Authors: Tiago Pimentel, Naomi Saphra, Adina Williams, Ryan Cotterell
- Abstract summary: We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
- Score: 87.09294772742737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The question of how to probe contextual word representations for linguistic
structure in a way that is both principled and useful has seen significant
attention recently in the NLP literature. In our contribution to this
discussion, we argue for a probe metric that reflects the fundamental trade-off
between probe complexity and performance: the Pareto hypervolume. To measure
complexity, we present a number of parametric and non-parametric metrics. Our
experiments using Pareto hypervolume as an evaluation metric show that probes
often do not conform to our expectations -- e.g., why should the non-contextual
fastText representations encode more morpho-syntactic information than the
contextual BERT representations? These results suggest that common, simplistic
probing tasks, such as part-of-speech labeling and dependency arc labeling, are
inadequate to evaluate the linguistic structure encoded in contextual word
representations. This leads us to propose full dependency parsing as a probing
task. In support of our suggestion that harder probing tasks are necessary, our
experiments with dependency parsing reveal a wide gap in syntactic knowledge
between contextual and non-contextual representations.
Related papers
- Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.
We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.
We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - Improve Retrieval-based Dialogue System via Syntax-Informed Attention [46.79601705850277]
We propose SIA, Syntax-Informed Attention, considering both intra- and inter-sentence syntax information.
We evaluate our method on three widely used benchmarks and experimental results demonstrate the general superiority of our method on dialogue response selection.
arXiv Detail & Related papers (2023-03-12T08:14:16Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of
Continuous Prompts [99.03864962014431]
Fine-tuning continuous prompts for target tasks has emerged as a compact alternative to full model fine-tuning.
In practice, we observe a "wayward" behavior between the task solved by continuous prompts and their nearest neighbor.
arXiv Detail & Related papers (2021-12-15T18:55:05Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - Picking BERT's Brain: Probing for Linguistic Dependencies in
Contextualized Embeddings Using Representational Similarity Analysis [13.016284599828232]
We investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word.
In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls.
arXiv Detail & Related papers (2020-11-24T13:19:06Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.