On the Interplay Between Fine-tuning and Composition in Transformers
- URL: http://arxiv.org/abs/2105.14668v2
- Date: Tue, 1 Jun 2021 01:11:04 GMT
- Title: On the Interplay Between Fine-tuning and Composition in Transformers
- Authors: Lang Yu and Allyson Ettinger
- Abstract summary: We investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information.
Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task.
We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models.
- Score: 7.513100214864645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained transformer language models have shown remarkable performance on
a variety of NLP tasks. However, recent research has suggested that
phrase-level representations in these models reflect heavy influences of
lexical content, but lack evidence of sophisticated, compositional phrase
information. Here we investigate the impact of fine-tuning on the capacity of
contextualized embeddings to capture phrase meaning information beyond lexical
content. Specifically, we fine-tune models on an adversarial paraphrase
classification task with high lexical overlap, and on a sentiment
classification task. After fine-tuning, we analyze phrasal representations in
controlled settings following prior work. We find that fine-tuning largely
fails to benefit compositionality in these representations, though training on
sentiment yields a small, localized benefit for certain models. In follow-up
analyses, we identify confounding cues in the paraphrase dataset that may
explain the lack of composition benefits from that task, and we discuss
potential factors underlying the localized benefits from sentiment training.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - How Truncating Weights Improves Reasoning in Language Models [49.80959223722325]
We study how certain global associations tend to be stored in specific weight components or Transformer blocks.
We analyze how this arises during training, both empirically and theoretically.
arXiv Detail & Related papers (2024-06-05T08:51:08Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Visual Referential Games Further the Emergence of Disentangled
Representations [0.12891210250935145]
This paper investigates how do compositionality at the level of emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games.
arXiv Detail & Related papers (2023-04-27T20:00:51Z) - Non-Linguistic Supervision for Contrastive Learning of Sentence
Embeddings [14.244787327283335]
We find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses.
The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP.
arXiv Detail & Related papers (2022-09-20T03:01:45Z) - UCTopic: Unsupervised Contrastive Learning for Phrase Representations
and Topic Mining [27.808028645942827]
UCTopic is a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining.
It is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics.
It outperforms the state-of-the-art phrase representation model by 38.2% NMI in average on four entity cluster-ing tasks.
arXiv Detail & Related papers (2022-02-27T22:43:06Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - "Let's Eat Grandma": When Punctuation Matters in Sentence Representation
for Sentiment Analysis [13.873803872380229]
We argue that punctuation could play a significant role in sentiment analysis and propose a novel representation model to improve syntactic and contextual performance.
We conduct experiments on publicly available datasets and verify that our model can identify the sentiments more accurately over other state-of-the-art baseline methods.
arXiv Detail & Related papers (2020-12-10T19:07:31Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Assessing Phrasal Representation and Composition in Transformers [13.460125148455143]
Deep transformer models have pushed performance on NLP tasks to new limits.
We present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers.
We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition.
arXiv Detail & Related papers (2020-10-08T04:59:39Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.