Related papers: Optimized classification with neural ODEs via separability

Related papers

Dimension-independent learning rates for high-dimensional classification problems [53.622581586464634]
We show that every $RBV2$ function can be approximated by a neural network with bounded weights. We then prove the existence of a neural network with bounded weights approximating a classification function.
arXiv Detail & Related papers (2024-09-26T16:02:13Z)
Fundamental computational limits of weak learnability in high-dimensional multi-index models [30.501140910531017]
This paper focuses on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $alpha!>!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace. In a limited but interesting set of really hard directions -- akin to the parity problem -- $alpha_c$ is found
arXiv Detail & Related papers (2024-05-24T11:59:02Z)
Minimum number of neurons in fully connected layers of a given neural network (the first approximation) [0.0]
The paper presents an algorithm for searching for the minimum number of neurons in fully connected layers of an arbitrary network solving given problem. The proposed algorithm is the first approximation for estimating the minimum number of neurons in the layer, since, on the one hand, the algorithm does not guarantee that a neural network with the found number of neurons can be trained to the required quality.
arXiv Detail & Related papers (2024-05-23T03:46:07Z)
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube. We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z)
A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning. MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z)
A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization [107.07093621337084]
Normalized-Cut (N-Cut) is a famous model of spectral clustering. Traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. We propose a novel N-Cut solver based on the famous coordinate descent method.
arXiv Detail & Related papers (2023-11-26T07:11:58Z)
First Steps Towards a Runtime Analysis of Neuroevolution [2.07180164747172]
We consider a simple setting in neuroevolution where an evolutionary algorithm optimize the weights and activation functions of a simple artificial neural network. We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers. Our results show that the proposed algorithm is generally efficient on two example problems designed for one neuron and efficient with at least constant probability on the example problem for a two-layer network.
arXiv Detail & Related papers (2023-07-03T07:30:58Z)
Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability. IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Approximate spectral clustering with eigenvector selection and self-tuned $k$ [1.827510863075184]
Recently emerged spectral clustering surpasses conventional clustering methods by detecting clusters of any shape without the convexity assumption. In practice, manual setting of $k$ could be subjective or time consuming. The proposed algorithm has two relevance metrics for estimating $k$ in two vital steps of ASC. The experimental setup demonstrates the efficiency of the proposed algorithm and its ability to compete with similar methods where $k$ was set manually.
arXiv Detail & Related papers (2023-02-22T11:32:24Z)
Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers [16.618918548497223]
We propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific clustering measured by the bounds number. We derive these results in various contexts and improve the known state-of-the-art label rates.
arXiv Detail & Related papers (2022-09-19T12:11:07Z)
Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process. We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z)
Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning. Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system. The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning [2.133032470368051]
Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. We propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning) Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression.
arXiv Detail & Related papers (2021-07-05T12:55:13Z)
Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded. Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances. The proposed method achieves promising results on both ScanetNetV2 and S3DIS. It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z)
The Efficacy of $L_1$ Regularization in Two-Layer Neural Networks [36.753907384994704]
A crucial problem in neural networks is to select the most appropriate number of hidden neurons and obtain tight statistical risk bounds. We show that $L_1$ regularization can control the generalization error and sparsify the input dimension. An excessively large number of neurons do not necessarily inflate generalization errors under a suitable regularization.
arXiv Detail & Related papers (2020-10-02T15:23:22Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.