Advancing State of the Art in Language Modeling
- URL: http://arxiv.org/abs/2312.03735v1
- Date: Tue, 28 Nov 2023 12:30:43 GMT
- Title: Advancing State of the Art in Language Modeling
- Authors: David Herel and Tomas Mikolov
- Abstract summary: Generalization is arguably the most important goal of statistical language modeling research.
Publicly available benchmarks and papers published with an open-source code have been critical to advancing the field.
We propose a simple framework that should help advance the state of the art in language modeling in terms of generalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalization is arguably the most important goal of statistical language
modeling research. Publicly available benchmarks and papers published with an
open-source code have been critical to advancing the field. However, it is
often very difficult, and sometimes even impossible, to reproduce the results
fully as reported in publications. In this paper, we propose a simple framework
that should help advance the state of the art in language modeling in terms of
generalization. We propose to publish not just the code, but also probabilities
on dev and test sets with future publications so that one can easily add the
new model into an ensemble. This has crucial advantages: it is much easier to
determine whether a newly proposed model is actually complementary to the
current baseline. Therefore, instead of inventing new names for the old tricks,
the scientific community can advance faster. Finally, this approach promotes
diversity of ideas: one does not need to create an individual model that is the
new state of the art to attract attention; it will be sufficient to develop a
new model that learns patterns which other models do not. Thus, even a
suboptimal model can be found to have value. Remarkably, our approach has
yielded new state-of-the-art results across various language modeling
benchmarks up to 10%.
Related papers
- Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community.
There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z) - GreekT5: A Series of Greek Sequence-to-Sequence Models for News
Summarization [0.0]
This paper proposes a series of novel TS models for Greek news articles.
The proposed models were thoroughly evaluated on the same dataset against GreekBART.
Our evaluation results reveal that most of the proposed models significantly outperform GreekBART on various evaluation metrics.
arXiv Detail & Related papers (2023-11-13T21:33:12Z) - Large Language Models Are Also Good Prototypical Commonsense Reasoners [11.108562540123387]
Traditional fine-tuning approaches can be resource-intensive and potentially compromise a model's generalization capacity.
We draw inspiration from the outputs of large models for tailored tasks and semi-automatically developed a set of novel prompts.
With better designed prompts we can achieve the new state-of-art(SOTA) on the ProtoQA leaderboard.
arXiv Detail & Related papers (2023-09-22T20:07:24Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set.
We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks.
We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP.
More elaborated architectures are making better use of contextual information.
Larger corpora are used as resources for pre-training large language models in a self-supervised fashion.
Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.