Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks
- URL: http://arxiv.org/abs/2407.00478v3
- Date: Tue, 17 Dec 2024 07:30:46 GMT
- Title: Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks
- Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang,
- Abstract summary: brute-force scaleup of training datasets, learnable parameters and computation power, has become a prevalent strategy for developing more robust learning models.<n>In this paper, we attempt to address this issue in a parsimonious manner, achieving greater potential with simpler models.<n>The key is to drive models using domain-specific knowledge, such as symbols, logic, and formulas, instead of purely relying on scaleup.
- Score: 47.6830995661091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The brute-force scaleup of training datasets, learnable parameters and computation power, has become a prevalent strategy for developing more robust learning models. However, due to bottlenecks in data, computation, and trust, the sustainability of this strategy is a serious concern. In this paper, we attempt to address this issue in a parsimonious manner (i.e., achieving greater potential with simpler models). The key is to drive models using domain-specific knowledge, such as symbols, logic, and formulas, instead of purely relying on scaleup. This approach allows us to build a framework that uses this knowledge as "building blocks" to achieve parsimony in model design, training, and interpretation. Empirical results show that our methods surpass those that typically follow the scaling law. We also demonstrate our framework in AI for science, specifically in the problem of drug-drug interaction prediction. We hope our research can foster more diverse technical roadmaps in the era of foundation models.
Related papers
- Looking beyond the next token [75.00751370502168]
We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process.
Our method naturally enables the generation of long-term goals at no additional cost.
arXiv Detail & Related papers (2025-04-15T16:09:06Z) - How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines [20.62274005080048]
Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies.
Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns.
scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches.
arXiv Detail & Related papers (2025-02-17T17:20:41Z) - Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm [0.195804735329484]
Reinforcement learning (RL) and Deep Reinforcement Learning (DRL) have the potential to disrupt and are already changing the way we interact with the world.
One of the key indicators of their applicability is their ability to scale and work in real-world scenarios.
arXiv Detail & Related papers (2024-08-19T14:50:48Z) - A Survey of Deep Learning and Foundation Models for Time Series
Forecasting [16.814826712022324]
Deep learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting.
Foundation models with extensive pre-training allow models to understand patterns and acquire knowledge that can be applied to new related problems.
There is ongoing research examining how to utilize or inject such knowledge into deep learning models.
arXiv Detail & Related papers (2024-01-25T03:14:07Z) - Breaking the Curse of Dimensionality in Deep Neural Networks by Learning
Invariant Representations [1.9580473532948401]
This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process.
We ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality.
Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models.
arXiv Detail & Related papers (2023-10-24T19:50:41Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - From Actions to Events: A Transfer Learning Approach Using Improved Deep
Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model.
Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Understanding Scaling Laws for Recommendation Models [1.6283945233720964]
We study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR)
We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute.
We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward.
arXiv Detail & Related papers (2022-08-17T19:13:17Z) - Algebraic Learning: Towards Interpretable Information Modeling [0.0]
This thesis addresses the issue of interpretability in general information modeling and endeavors to ease the problem from two scopes.
Firstly, a problem-oriented perspective is applied to incorporate knowledge into modeling practice, where interesting mathematical properties emerge naturally.
Secondly, given a trained model, various methods could be applied to extract further insights about the underlying system.
arXiv Detail & Related papers (2022-03-13T15:53:39Z) - Bayesian Deep Learning for Graphs [6.497816402045099]
dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification issues.
We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion.
This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks.
arXiv Detail & Related papers (2022-02-24T20:18:41Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Scaling Laws for Deep Learning [1.90365714903665]
In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs.
We first demonstrate that deep learning training and pruning are predictable and governed by scaling laws.
We then show through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit.
arXiv Detail & Related papers (2021-08-17T15:37:05Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - Towards Interpretable Deep Learning Models for Knowledge Tracing [62.75876617721375]
We propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models.
Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model.
Experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions.
arXiv Detail & Related papers (2020-05-13T04:03:21Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.