Related papers: Outlier-Robust Training of Machine Learning Models

Outlier-Robust Training of Machine Learning Models

URL: http://arxiv.org/abs/2501.00265v1
Date: Tue, 31 Dec 2024 04:19:53 GMT
Title: Outlier-Robust Training of Machine Learning Models
Authors: Rajat Talak, Charis Georgiou, Jingnan Shi, Luca Carlone,
Abstract summary: We propose an Adaptive Alternation Algorithm for training machine learning models with outliers.<n>The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each.<n>Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels sigma increases the region of convergence.
Score: 21.352210662488112
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robust training of machine learning models in the presence of outliers has garnered attention across various domains. The use of robust losses is a popular approach and is known to mitigate the impact of outliers. We bring to light two literatures that have diverged in their ways of designing robust losses: one using M-estimation, which is popular in robotics and computer vision, and another using a risk-minimization framework, which is popular in deep learning. We first show that a simple modification of the Black-Rangarajan duality provides a unifying view. The modified duality brings out a definition of a robust loss kernel $\sigma$ that is satisfied by robust losses in both the literatures. Secondly, using the modified duality, we propose an Adaptive Alternation Algorithm (AAA) for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each iteration. The algorithm is augmented with a novel parameter update rule by interpreting the weights as inlier probabilities, and obviates the need for complex parameter tuning. Thirdly, we investigate convergence of the adaptive alternation algorithm to outlier-free optima. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels {\sigma} increases the region of convergence. We experimentally show the efficacy of our algorithm on regression, classification, and neural scene reconstruction problems. We release our implementation code: https://github.com/MIT-SPARK/ORT.

Related papers

MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
Large gradient algorithms like Adam, Adam, and their variants have been central to the development of this type of training. We propose a framework that reconciles preconditioned gradient optimization methods with variance reduction via a scaled momentum technique.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
The Persian Rug: solving toy models of superposition using large-scale symmetries [0.0]
We present a complete mechanistic description of the algorithm learned by a minimal non-linear sparse data autoencoder in the limit of large input dimension. Our work contributes to neural network interpretability by introducing techniques for understanding the structure of autoencoders.
arXiv Detail & Related papers (2024-10-15T22:52:45Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order. Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks. We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z)
Outlier-Robust Neural Network Training: Efficient Optimization of Transformed Trimmed Loss with Variation Regularization [2.5628953713168685]
We consider outlier-robust predictive modeling using highly-expressive neural networks. We employ (1) a transformed trimmed loss (TTL), which is a computationally feasible variant of the classical trimmed loss, and (2) a higher-order variation regularization (HOVR) of the prediction model.
arXiv Detail & Related papers (2023-08-04T12:57:13Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Learning Augmentation Distributions using Transformed Risk Minimization [47.236227685707526]
We propose a new emphTransformed Risk Minimization (TRM) framework as an extension of classical risk minimization. As a key application, we focus on learning augmentations to improve classification performance with a given class of predictors.
arXiv Detail & Related papers (2021-11-16T02:07:20Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Outlier-Robust Estimation: Hardness, Minimally Tuned Algorithms, and Applications [25.222024234900445]
This paper introduces two unifying formulations for outlier-robust estimation, Generalized Maximum Consensus (G-MC) and Generalized Truncated Least Squares (G-TLS) Our first contribution is a proof that outlier-robust estimation is inapproximable: in the worst case, it is impossible to (even approximately) find the set of outliers. We propose the first minimally tuned algorithms for outlier rejection, that dynamically decide how to separate inliers from outliers.
arXiv Detail & Related papers (2020-07-29T21:06:13Z)
TAdam: A Robust Stochastic Gradient Optimizer [6.973803123972298]
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
arXiv Detail & Related papers (2020-02-29T04:32:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.