Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
- URL: http://arxiv.org/abs/2407.00478v2
- Date: Thu, 10 Oct 2024 15:41:11 GMT
- Title: Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
- Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang,
- Abstract summary: We develop next-generation models in a parsimonious manner, achieving greater potential with simpler models.
The key is to drive models using domain-specific knowledge, such as symbols, logic, and formulas, instead of relying on the scaling law.
This approach allows us to build a framework that uses this knowledge as "building blocks" to achieve parsimony in model design, training, and interpretation.
- Score: 47.6830995661091
- License:
- Abstract: The scaling law, which involves the brute-force expansion of training datasets and learnable parameters, has become a prevalent strategy for developing more robust learning models. However, due to bottlenecks in data, computation, and trust, the sustainability of the scaling law is a serious concern for the future of deep learning. In this paper, we address this issue by developing next-generation models in a parsimonious manner (i.e., achieving greater potential with simpler models). The key is to drive models using domain-specific knowledge, such as symbols, logic, and formulas, instead of relying on the scaling law. This approach allows us to build a framework that uses this knowledge as "building blocks" to achieve parsimony in model design, training, and interpretation. Empirical results show that our methods surpass those that typically follow the scaling law. We also demonstrate the application of our framework in AI for science, specifically in the problem of drug-drug interaction prediction. We hope our research can foster more diverse technical roadmaps in the era of foundation models.
Related papers
- A Survey of Deep Learning and Foundation Models for Time Series
Forecasting [16.814826712022324]
Deep learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting.
Foundation models with extensive pre-training allow models to understand patterns and acquire knowledge that can be applied to new related problems.
There is ongoing research examining how to utilize or inject such knowledge into deep learning models.
arXiv Detail & Related papers (2024-01-25T03:14:07Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Understanding Scaling Laws for Recommendation Models [1.6283945233720964]
We study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR)
We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute.
We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward.
arXiv Detail & Related papers (2022-08-17T19:13:17Z) - Algebraic Learning: Towards Interpretable Information Modeling [0.0]
This thesis addresses the issue of interpretability in general information modeling and endeavors to ease the problem from two scopes.
Firstly, a problem-oriented perspective is applied to incorporate knowledge into modeling practice, where interesting mathematical properties emerge naturally.
Secondly, given a trained model, various methods could be applied to extract further insights about the underlying system.
arXiv Detail & Related papers (2022-03-13T15:53:39Z) - Bayesian Deep Learning for Graphs [6.497816402045099]
dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification issues.
We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion.
This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks.
arXiv Detail & Related papers (2022-02-24T20:18:41Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Scaling Laws for Deep Learning [1.90365714903665]
In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs.
We first demonstrate that deep learning training and pruning are predictable and governed by scaling laws.
We then show through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit.
arXiv Detail & Related papers (2021-08-17T15:37:05Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.