Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization
- URL: http://arxiv.org/abs/2402.01375v1
- Date: Fri, 2 Feb 2024 12:59:27 GMT
- Title: Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization
- Authors: Andreas Waldis, Yufang Hou, Iryna Gurevych
- Abstract summary: This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
- Score: 66.4659448305396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (LMs) perform well in In-Topic setups, where
training and testing data come from the same topics. However, they face
challenges in Cross-Topic scenarios where testing data is derived from distinct
topics -- such as Gun Control. This study analyzes various LMs with three
probing-based experiments to shed light on the reasons behind the In- vs.
Cross-Topic generalization gap. Thereby, we demonstrate, for the first time,
that generalization gaps and the robustness of the embedding space vary
significantly across LMs. Additionally, we assess larger LMs and underscore the
relevance of our analysis for recent models. Overall, diverse pre-training
objectives, architectural regularization, or data deduplication contribute to
more robust LMs and diminish generalization gaps. Our research contributes to a
deeper understanding and comparison of language models across different
generalization scenarios.
Related papers
- Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We investigate the interplay between generalization and memorization in large language models at scale.
With various sizes of open-source LLMs and their pretraining corpora, we observe that as the model size increases, the task-relevant $n$-gram pair data becomes increasingly important.
Our results support the hypothesis that LLMs' capabilities emerge from a delicate balance of memorization and generalization with sufficient task-related pretraining data.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - On the Universal Truthfulness Hyperplane Inside LLMs [27.007142483859162]
We investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model.
Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios.
arXiv Detail & Related papers (2024-07-11T15:07:26Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z) - On the Out-Of-Distribution Generalization of Multimodal Large Language
Models [24.431960338495184]
We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs)
We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery.
We show that in-context learning can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers.
arXiv Detail & Related papers (2024-02-09T18:21:51Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Understanding Attention in Machine Reading Comprehension [56.72165932439117]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final performance.
We perform quantitative analyses on SQuAD (English) and CMRC 2018 (Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and ELECTRA.
We discover that em passage-to-question and em passage understanding attentions are the most important ones, showing strong correlations to the final performance.
arXiv Detail & Related papers (2021-08-26T04:23:57Z) - Revisiting Training Strategies and Generalization Performance in Deep
Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices.
Under consistent comparison, DML objectives show much higher saturation than indicated by literature.
Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.