Probing for Bridging Inference in Transformer Language Models
- URL: http://arxiv.org/abs/2104.09400v1
- Date: Mon, 19 Apr 2021 15:42:24 GMT
- Title: Probing for Bridging Inference in Transformer Language Models
- Authors: Onkar Pandit and Yufang Hou
- Abstract summary: We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations.
We consider language models as a whole in our approach where bridging anaphora resolution is formulated as a masked token prediction task.
Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference.
- Score: 15.216901057561428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We probe pre-trained transformer language models for bridging inference. We
first investigate individual attention heads in BERT and observe that attention
heads at higher layers prominently focus on bridging relations in-comparison
with the lower and middle layers, also, few specific attention heads
concentrate consistently on bridging. More importantly, we consider language
models as a whole in our second approach where bridging anaphora resolution is
formulated as a masked token prediction task (Of-Cloze test). Our formulation
produces optimistic results without any fine-tuning, which indicates that
pre-trained language models substantially capture bridging inference. Our
further investigation shows that the distance between anaphor-antecedent and
the context provided to language models play an important role in the
inference.
Related papers
- Unveiling Multilinguality in Transformer Models: Exploring Language
Specificity in Feed-Forward Networks [12.7259425362286]
We investigate how multilingual models might leverage key-value memories.
For autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages?
Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.
arXiv Detail & Related papers (2023-10-24T06:45:00Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - MiQA: A Benchmark for Inference on Metaphorical Questions [5.32836690371986]
We propose a benchmark to assess the capability of large language models to reason with conventional metaphors.
We examine the performance of state-of-the-art pre-trained models on binary-choice tasks.
arXiv Detail & Related papers (2022-10-14T17:46:05Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Using Pre-Trained Language Models for Producing Counter Narratives
Against Hate Speech: a Comparative Study [17.338923885534193]
We present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation.
We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs.
Findings show that autoregressive models combined with decodings are the most promising.
arXiv Detail & Related papers (2022-04-04T12:44:47Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.