An Empirical Investigation into Deep and Shallow Rule Learning
- URL: http://arxiv.org/abs/2106.10254v1
- Date: Fri, 18 Jun 2021 17:43:17 GMT
- Title: An Empirical Investigation into Deep and Shallow Rule Learning
- Authors: Florian Beck and Johannes F\"urnkranz
- Abstract summary: In this paper, we empirically compare deep and shallow rule learning with a uniform general algorithm.
Our experiments on both artificial and real-world benchmark data indicate that deep rule networks outperform shallow networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inductive rule learning is arguably among the most traditional paradigms in
machine learning. Although we have seen considerable progress over the years in
learning rule-based theories, all state-of-the-art learners still learn
descriptions that directly relate the input features to the target concept. In
the simplest case, concept learning, this is a disjunctive normal form (DNF)
description of the positive class. While it is clear that this is sufficient
from a logical point of view because every logical expression can be reduced to
an equivalent DNF expression, it could nevertheless be the case that more
structured representations, which form deep theories by forming intermediate
concepts, could be easier to learn, in very much the same way as deep neural
networks are able to outperform shallow networks, even though the latter are
also universal function approximators. In this paper, we empirically compare
deep and shallow rule learning with a uniform general algorithm, which relies
on greedy mini-batch based optimization. Our experiments on both artificial and
real-world benchmark data indicate that deep rule networks outperform shallow
networks.
Related papers
- Understanding Deep Learning via Notions of Rank [5.439020425819001]
This thesis puts forth notions of rank as key for developing a theory of deep learning.
In particular, we establish that gradient-based training can induce an implicit regularization towards low rank for several neural network architectures.
Practical implications of our theory for designing explicit regularization schemes and data preprocessing algorithms are presented.
arXiv Detail & Related papers (2024-08-04T18:47:55Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Information Flow in Deep Neural Networks [0.6922389632860545]
There is no comprehensive theoretical understanding of how deep neural networks work or are structured.
Deep networks are often seen as black boxes with unclear interpretations and reliability.
This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
arXiv Detail & Related papers (2022-02-10T23:32:26Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Deep Learning is Singular, and That's Good [31.985399645173022]
In singular models, the optimal set of parameters forms an analytic set with singularities and classical statistical inference cannot be applied.
This is significant for deep learning as neural networks are singular and thus "dividing" by the determinant of the Hessian or employing the Laplace approximation are not appropriate.
Despite its potential for addressing fundamental issues in deep learning, singular learning theory appears to have made little inroads into the developing canon of deep learning theory.
arXiv Detail & Related papers (2020-10-22T09:33:59Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - A Chain Graph Interpretation of Real-World Neural Networks [58.78692706974121]
We propose an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure.
The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models.
We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques.
arXiv Detail & Related papers (2020-06-30T14:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.