Effect of Post-processing on Contextualized Word Representations
- URL: http://arxiv.org/abs/2104.07456v1
- Date: Thu, 15 Apr 2021 13:40:42 GMT
- Title: Effect of Post-processing on Contextualized Word Representations
- Authors: Hassan Sajjad and Firoj Alam and Fahim Dalvi and Nadir Durrani
- Abstract summary: Post-processing of static embedding has beenshown to improve their performance on both lexical and sequence-level tasks.
We question the usefulness of post-processing for contextualized embeddings obtained from different layers of pre-trained language models.
- Score: 20.856802441794162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-processing of static embedding has beenshown to improve their
performance on both lexical and sequence-level tasks. However, post-processing
for contextualized embeddings is an under-studied problem. In this work, we
question the usefulness of post-processing for contextualized embeddings
obtained from different layers of pre-trained language models. More
specifically, we standardize individual neuron activations using z-score,
min-max normalization, and by removing top principle components using the
all-but-the-top method. Additionally, we apply unit length normalization to
word representations. On a diverse set of pre-trained models, we show that
post-processing unwraps vital information present in the representations for
both lexical tasks (such as word similarity and analogy)and sequence
classification tasks. Our findings raise interesting points in relation to
theresearch studies that use contextualized representations, and suggest
z-score normalization as an essential step to consider when using them in an
application.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers [12.610445666406898]
We investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM)
To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs.
We also try to empirically localize the strength of contextualization information encoded in these sub-layer representations.
arXiv Detail & Related papers (2024-09-21T10:42:07Z) - Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations.
We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - Exploiting Word Semantics to Enrich Character Representations of Chinese
Pre-trained Models [12.0190584907439]
We propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models.
We show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks.
arXiv Detail & Related papers (2022-07-13T02:28:08Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.