Related papers: Beyond Scaleup: Knowledge-aware Parsimony Learning from Deep Networks

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Looking beyond the next token [75.00751370502168]
We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process. Our method naturally enables the generation of long-term goals at no additional cost.
arXiv Detail & Related papers (2025-04-15T16:09:06Z)
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines [20.62274005080048]
Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches.
arXiv Detail & Related papers (2025-02-17T17:20:41Z)
Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm [0.195804735329484]
Reinforcement learning (RL) and Deep Reinforcement Learning (DRL) have the potential to disrupt and are already changing the way we interact with the world. One of the key indicators of their applicability is their ability to scale and work in real-world scenarios.
arXiv Detail & Related papers (2024-08-19T14:50:48Z)
A Survey of Deep Learning and Foundation Models for Time Series Forecasting [16.814826712022324]
Deep learning has been successfully applied to many application domains, yet its advantages have been slow to emerge for time series forecasting. Foundation models with extensive pre-training allow models to understand patterns and acquire knowledge that can be applied to new related problems. There is ongoing research examining how to utilize or inject such knowledge into deep learning models.
arXiv Detail & Related papers (2024-01-25T03:14:07Z)
Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations [1.9580473532948401]
This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. We ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models.
arXiv Detail & Related papers (2023-10-24T19:50:41Z)
Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations. We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
From Actions to Events: A Transfer Learning Approach Using Improved Deep Belief Networks [1.0554048699217669]
This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process.
arXiv Detail & Related papers (2022-11-30T14:47:10Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Understanding Scaling Laws for Recommendation Models [1.6283945233720964]
We study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR) We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute. We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward.
arXiv Detail & Related papers (2022-08-17T19:13:17Z)
Algebraic Learning: Towards Interpretable Information Modeling [0.0]
This thesis addresses the issue of interpretability in general information modeling and endeavors to ease the problem from two scopes. Firstly, a problem-oriented perspective is applied to incorporate knowledge into modeling practice, where interesting mathematical properties emerge naturally. Secondly, given a trained model, various methods could be applied to extract further insights about the underlying system.
arXiv Detail & Related papers (2022-03-13T15:53:39Z)
Bayesian Deep Learning for Graphs [6.497816402045099]
dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks.
arXiv Detail & Related papers (2022-02-24T20:18:41Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data. We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
Scaling Laws for Deep Learning [1.90365714903665]
In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that deep learning training and pruning are predictable and governed by scaling laws. We then show through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit.
arXiv Detail & Related papers (2021-08-17T15:37:05Z)
Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
Towards Interpretable Deep Learning Models for Knowledge Tracing [62.75876617721375]
We propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models. Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model. Experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions.
arXiv Detail & Related papers (2020-05-13T04:03:21Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.