Related papers: Fidelity Isn't Accuracy: When Linearly Decodable Functions Fail to Match the Ground Truth

Related papers

NIMO: a Nonlinear Interpretable MOdel [1.4623202528810306]
We introduce NIMO (Nonlinear Interpretable MOdel) to create a model where the NN is designed to learn nonlinear corrections to the linear model predictions.<n>We show empirically that we can recover the underlying linear coefficients while significantly improving the predictive accuracy.<n>Compared to other hybrid interpretable approaches, our model is the only one that actually maintains the same interpretability of linear coefficients as in linear models.
arXiv Detail & Related papers (2025-06-05T14:02:55Z)
Pretrained transformer efficiently learns low-dimensional target functions in-context [40.77319247558742]
We show that a nonlinear transformer optimized by gradient descent learns $f_*$ in-context with a prompt length that only depends on the dimension of the distribution of target functions $r$. Our result highlights the adaptivity of the pretrained transformer to low-dimensional structures of the function class, which enables sample-efficient ICL.
arXiv Detail & Related papers (2024-11-04T19:24:39Z)
Graph Structure Learning with Interpretable Bayesian Neural Networks [10.957528713294874]
We introduce novel iterations with independently interpretable parameters. These parameters influence characteristics of the estimated graph, such as edge sparsity. After unrolling these iterations, prior knowledge over such graph characteristics shape prior distributions. Fast execution and parameter efficiency allow for high-fidelity posterior approximation.
arXiv Detail & Related papers (2024-06-20T23:27:41Z)
Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability. We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z)
A Novel Explanation Against Linear Neural Networks [1.223779595809275]
Linear Regression and neural networks are widely used to model data. We show that neural networks without activation functions, or linear neural networks, actually reduce both training and testing performance. We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on noisy datasets.
arXiv Detail & Related papers (2023-12-30T09:44:51Z)
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition [64.59093444558549]
We propose a simple, easy-to-implement, two-step training pipeline that we call From Fake to Real. By training on real and synthetic data separately, FFR does not expose the model to the statistical differences between real and synthetic data. Our experiments show that FFR improves worst group accuracy over the state-of-the-art by up to 20% over three datasets.
arXiv Detail & Related papers (2023-08-08T19:52:28Z)
Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks [28.2446416597989]
We first consider recovery of sparse vectors from few linear measurements. We then extend our results to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Our results shed some light on seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants.
arXiv Detail & Related papers (2023-08-05T10:17:04Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
Interpreting Bias in the Neural Networks: A Peek Into Representational Similarity [0.0]
We investigate the performance and internal representational structure of convolution-based neural networks trained on biased data. We specifically study similarities in representations, using Centered Kernel Alignment (CKA) for different objective functions. We note that without progressive representational similarities among the layers of a neural network, the performance is less likely to be robust.
arXiv Detail & Related papers (2022-11-14T22:17:14Z)
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features. We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits. We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z)
Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features. We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z)
Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection with shapley additive explanation(SHAP) [0.0]
Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security. Lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature. XAI is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output.
arXiv Detail & Related papers (2021-12-14T09:42:04Z)
Distributed Sparse Feature Selection in Communication-Restricted Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
Linear predictor on linearly-generated data with missing values: non consistency and solutions [0.0]
We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data. We show that, in the presence of missing values, the optimal predictor may not be linear.
arXiv Detail & Related papers (2020-02-03T11:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.