Related papers: Machine Learning vs Deep Learning: The Generalization Problem

Machine Learning vs Deep Learning: The Generalization Problem

URL: http://arxiv.org/abs/2403.01621v1
Date: Sun, 3 Mar 2024 21:42:55 GMT
Title: Machine Learning vs Deep Learning: The Generalization Problem
Authors: Yong Yi Bay and Kathleen A. Yearick
Abstract summary: This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation. We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain. Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The capacity to generalize beyond the range of training data is a pivotal challenge, often synonymous with a model's utility and robustness. This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation -- a more challenging aspect of generalization because it requires the model to make inferences about data points that lie outside the domain it has been trained on. We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain. The choice of this function allows us to distinctly showcase the divergence in performance when models are required to predict beyond the scope of their training data. Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope, an essential feature for real-world applications where data is often incomplete or extends beyond the observed range. This paper argues for a nuanced understanding of the structural differences between ML and DL models, with an emphasis on the implications for both theoretical research and practical deployment.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Using External knowledge to Enhanced PLM for Semantic Matching [38.125341836302525]
In this paper, we use external knowledge to enhance the pre-trained semantic relevance discrimination model.<n> Experimental results on 10 public datasets show that our method achieves consistent improvements in performance.
arXiv Detail & Related papers (2025-05-10T11:33:48Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning [37.745896674964186]
Multi-task learning (MTL) aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks. Continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge. We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup. Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods.
arXiv Detail & Related papers (2024-08-29T23:22:40Z)
Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Interpreting and generalizing deep learning in physics-based problems with functional linear models [1.1440052544554358]
Interpretability is crucial and often desired in modeling physical systems. We present test cases in solid mechanics, fluid mechanics, and transport. Our study underscores the significance of interpretable representation in scientific machine learning.
arXiv Detail & Related papers (2023-07-10T14:01:29Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE) We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z)
Effective dimension of machine learning models [4.721845865189576]
Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning. Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice. We propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets.
arXiv Detail & Related papers (2021-12-09T10:00:18Z)
Modeling Generalization in Machine Learning: A Methodological and Computational Study [0.8057006406834467]
We use the concept of the convex hull of the training data in assessing machine learning generalization. We observe unexpectedly weak associations between the generalization ability of machine learning models and all metrics related to dimensionality.
arXiv Detail & Related papers (2020-06-28T19:06:16Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.