Do You Have the Right Scissors? Tailoring Pre-trained Language Models
via Monte-Carlo Methods
- URL: http://arxiv.org/abs/2007.06162v1
- Date: Mon, 13 Jul 2020 02:53:03 GMT
- Title: Do You Have the Right Scissors? Tailoring Pre-trained Language Models
via Monte-Carlo Methods
- Authors: Ning Miao, Yuxuan Song, Hao Zhou, Lei Li
- Abstract summary: It has been a common approach to pre-train a language model on a large corpus and fine-tune it on task-specific data.
We propose MC-Tailor, a novel method to alleviate the issue in text generation tasks by truncating and transferring the probability mass from over-estimated regions to under-estimated ones.
- Score: 27.411569071211378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been a common approach to pre-train a language model on a large corpus
and fine-tune it on task-specific data. In practice, we observe that
fine-tuning a pre-trained model on a small dataset may lead to over- and/or
under-estimation problem. In this paper, we propose MC-Tailor, a novel method
to alleviate the above issue in text generation tasks by truncating and
transferring the probability mass from over-estimated regions to
under-estimated ones. Experiments on a variety of text generation datasets show
that MC-Tailor consistently and significantly outperforms the fine-tuning
approach. Our code is available at this url.
Related papers
- Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques [5.735035463793008]
We show that for Argument Mining, data transfer obtains better results than model-transfer.
For few-shot, the type of task (length and complexity of the sequence spans) and sampling method prove to be crucial.
arXiv Detail & Related papers (2024-07-04T08:59:17Z) - Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models [37.39843935632105]
We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
arXiv Detail & Related papers (2023-07-13T12:11:36Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Efficient and Flexible Topic Modeling using Pretrained Embeddings and
Bag of Sentences [1.8592384822257952]
We propose a novel topic modeling and inference algorithm.
We leverage pre-trained sentence embeddings by combining generative process models and clustering.
TheTailor evaluation shows that our method yields state-of-the art results with relatively little computational demands.
arXiv Detail & Related papers (2023-02-06T20:13:11Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Show Me How To Revise: Improving Lexically Constrained Sentence
Generation with XLNet [27.567493727582736]
We propose a two-step approach, "Predict and Revise", for constrained sentence generation.
During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence.
During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior.
Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity.
arXiv Detail & Related papers (2021-09-13T09:21:07Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.