Machine Learning vs Deep Learning: The Generalization Problem
- URL: http://arxiv.org/abs/2403.01621v1
- Date: Sun, 3 Mar 2024 21:42:55 GMT
- Title: Machine Learning vs Deep Learning: The Generalization Problem
- Authors: Yong Yi Bay and Kathleen A. Yearick
- Abstract summary: This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The capacity to generalize beyond the range of training data is a pivotal
challenge, often synonymous with a model's utility and robustness. This study
investigates the comparative abilities of traditional machine learning (ML)
models and deep learning (DL) algorithms in terms of extrapolation -- a more
challenging aspect of generalization because it requires the model to make
inferences about data points that lie outside the domain it has been trained
on. We present an empirical analysis where both ML and DL models are trained on
an exponentially growing function and then tested on values outside the
training domain. The choice of this function allows us to distinctly showcase
the divergence in performance when models are required to predict beyond the
scope of their training data. Our findings suggest that deep learning models
possess inherent capabilities to generalize beyond the training scope, an
essential feature for real-world applications where data is often incomplete or
extends beyond the observed range. This paper argues for a nuanced
understanding of the structural differences between ML and DL models, with an
emphasis on the implications for both theoretical research and practical
deployment.
Related papers
- Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning [37.745896674964186]
Multi-task learning (MTL) aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks.
Continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge.
We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup.
Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods.
arXiv Detail & Related papers (2024-08-29T23:22:40Z) - Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors.
We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner.
We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Interpreting and generalizing deep learning in physics-based problems with functional linear models [1.1440052544554358]
Interpretability is crucial and often desired in modeling physical systems.
We present test cases in solid mechanics, fluid mechanics, and transport.
Our study underscores the significance of interpretable representation in scientific machine learning.
arXiv Detail & Related papers (2023-07-10T14:01:29Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Effective dimension of machine learning models [4.721845865189576]
Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning.
Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice.
We propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets.
arXiv Detail & Related papers (2021-12-09T10:00:18Z) - Modeling Generalization in Machine Learning: A Methodological and
Computational Study [0.8057006406834467]
We use the concept of the convex hull of the training data in assessing machine learning generalization.
We observe unexpectedly weak associations between the generalization ability of machine learning models and all metrics related to dimensionality.
arXiv Detail & Related papers (2020-06-28T19:06:16Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.