Rethinking Self-Supervision Objectives for Generalizable Coherence
Modeling
- URL: http://arxiv.org/abs/2110.07198v1
- Date: Thu, 14 Oct 2021 07:44:14 GMT
- Title: Rethinking Self-Supervision Objectives for Generalizable Coherence
Modeling
- Authors: Prathyusha Jwalapuram, Shafiq Joty and Xiang Lin
- Abstract summary: Coherence evaluation of machine generated text is one of the principal applications of coherence models that needs to be investigated.
We explore training data and self-supervision objectives that result in a model that generalizes well across tasks.
We show empirically that increasing the density of negative samples improves the basic model, and using a global negative queue further improves and stabilizes the model while training with hard negative samples.
- Score: 8.329870357145927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although large-scale pre-trained neural models have shown impressive
performances in a variety of tasks, their ability to generate coherent text
that appropriately models discourse phenomena is harder to evaluate and less
understood. Given the claims of improved text generation quality across various
systems, we consider the coherence evaluation of machine generated text to be
one of the principal applications of coherence models that needs to be
investigated. We explore training data and self-supervision objectives that
result in a model that generalizes well across tasks and can be used
off-the-shelf to perform such evaluations. Prior work in neural coherence
modeling has primarily focused on devising new architectures, and trained the
model to distinguish coherent and incoherent text through pairwise
self-supervision on the permuted documents task. We instead use a basic model
architecture and show significant improvements over state of the art within the
same training regime. We then design a harder self-supervision objective by
increasing the ratio of negative samples within a contrastive learning setup,
and enhance the model further through automatic hard negative mining coupled
with a large global negative queue encoded by a momentum encoder. We show
empirically that increasing the density of negative samples improves the basic
model, and using a global negative queue further improves and stabilizes the
model while training with hard negative samples. We evaluate the coherence
model on task-independent test sets that resemble real-world use cases and show
significant improvements in coherence evaluations of downstream applications.
Related papers
- LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging [10.33844295243509]
We propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named textscLoRE-Merging.
Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference.
arXiv Detail & Related papers (2025-02-15T10:18:46Z) - A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness.
Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z) - Self-Improvement in Language Models: The Sharpening Mechanism [70.9248553790022]
We offer a new perspective on the capabilities of self-improvement through a lens we refer to as sharpening.
Motivated by the observation that language models are often better at verifying response quality than they are at generating correct responses, we formalize self-improvement as using the model itself as a verifier during post-training.
We analyze two natural families of self-improvement algorithms based on SFT and RLHF.
arXiv Detail & Related papers (2024-12-02T20:24:17Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Mean Embeddings with Test-Time Data Augmentation for Ensembling of
Representations [8.336315962271396]
We look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA)
MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models.
We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
arXiv Detail & Related papers (2021-06-15T10:49:46Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.