One Objective for All Models -- Self-supervised Learning for Topic
Models
- URL: http://arxiv.org/abs/2203.03539v1
- Date: Wed, 2 Feb 2022 06:20:59 GMT
- Title: One Objective for All Models -- Self-supervised Learning for Topic
Models
- Authors: Zeping Luo, Cindy Weng, Shiyou Wu, Mo Zhou, Rong Ge
- Abstract summary: We highlight a key advantage of self-supervised learning -- when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model.
In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models.
- Score: 11.67381769733002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has significantly improved the performance of many
NLP tasks. In this paper, we highlight a key advantage of self-supervised
learning -- when applied to data generated by topic models, self-supervised
learning can be oblivious to the specific model, and hence is less susceptible
to model misspecification. In particular, we prove that commonly used
self-supervised objectives based on reconstruction or contrastive samples can
both recover useful posterior information for general topic models.
Empirically, we show that the same objectives can perform competitively against
posterior inference using the correct model, while outperforming posterior
inference using misspecified model.
Related papers
- Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models [10.449015816015566]
Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference.
We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap.
We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance.
arXiv Detail & Related papers (2024-12-03T18:47:26Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Mitigate Domain Shift by Primary-Auxiliary Objectives Association for
Generalizing Person ReID [39.98444065846305]
ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective.
We introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection.
Our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective.
arXiv Detail & Related papers (2023-10-24T15:15:57Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Effective dimension of machine learning models [4.721845865189576]
Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning.
Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice.
We propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets.
arXiv Detail & Related papers (2021-12-09T10:00:18Z) - Self-Supervised Models are Continual Learners [79.70541692930108]
We show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for Continual Learning.
We devise a framework for Continual self-supervised visual representation Learning that significantly improves the quality of the learned representations.
arXiv Detail & Related papers (2021-12-08T10:39:13Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.