Exploring Generalization Ability of Pretrained Language Models on
Arithmetic and Logical Reasoning
- URL: http://arxiv.org/abs/2108.06743v1
- Date: Sun, 15 Aug 2021 13:42:10 GMT
- Title: Exploring Generalization Ability of Pretrained Language Models on
Arithmetic and Logical Reasoning
- Authors: Cunxiang Wang, Boyuan Zheng, Yuchen Niu and Yue Zhang
- Abstract summary: We investigate the generalization ability of pre-trained language models (PLMs)
We conduct experiments on one of the most advanced and publicly released generative PLM - BART.
Our research finds that the PLMs can easily generalize when the distribution is the same, however, it is still difficult for them to generalize out of the distribution.
- Score: 8.879537068017367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To quantitatively and intuitively explore the generalization ability of
pre-trained language models (PLMs), we have designed several tasks of
arithmetic and logical reasoning. We both analyse how well PLMs generalize when
the test data is in the same distribution as the train data and when it is
different, for the latter analysis, we have also designed a cross-distribution
test set other than the in-distribution test set. We conduct experiments on one
of the most advanced and publicly released generative PLM - BART. Our research
finds that the PLMs can easily generalize when the distribution is the same,
however, it is still difficult for them to generalize out of the distribution.
Related papers
- Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions [1.3537117504260623]
We study whether it is always optimal for the training distribution to be identical to the test distribution when the learner is allowed to be optimally adapted to the training distribution.<n>We also show that when certain regularities are imposed on the target functions, the standard conclusion is recovered in the case of the uniform distribution.
arXiv Detail & Related papers (2025-11-30T09:06:07Z) - What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction [16.63148156570219]
We argue that different settings lead to three distinct intended output distributions.<n>We demonstrate that NLP works often assume that these distributions should be similar, which leads to misinterpretations of their experimental findings.
arXiv Detail & Related papers (2025-05-04T11:46:48Z) - Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data.
Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge.
We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z) - What Are the Odds? Language Models Are Capable of Probabilistic Reasoning [23.487484744911995]
We focus on evaluating the probabilistic reasoning capabilities of language models (LMs) using idealized and real-world statistical distributions.
We perform a systematic evaluation of state-of-the-art LMs on three tasks: estimating percentiles, drawing samples, and calculating probabilities.
arXiv Detail & Related papers (2024-06-18T17:51:24Z) - Any-Shift Prompting for Generalization over Distributions [66.29237565901734]
We propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning.
Within this framework, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution.
The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time.
arXiv Detail & Related papers (2024-02-15T16:53:42Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - Simple and effective data augmentation for compositional generalization [64.00420578048855]
We show that data augmentation methods that sample MRs and backtranslate them can be effective for compositional generalization.
Remarkably, sampling from a uniform distribution performs almost as well as sampling from the test distribution.
arXiv Detail & Related papers (2024-01-18T09:13:59Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Characterizing Generalization under Out-Of-Distribution Shifts in Deep
Metric Learning [32.51394862932118]
We present the ooDML benchmark to characterize generalization under out-of-distribution shifts in DML.
ooDML is designed to probe the generalization performance on much more challenging, diverse train-to-test distribution shifts.
We find that while generalization tends to consistently degrade with difficulty, some methods are better at retaining performance as the distribution shift increases.
arXiv Detail & Related papers (2021-07-20T15:26:09Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.