Related papers: LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

URL: http://arxiv.org/abs/2305.04971v1
Date: Mon, 8 May 2023 18:04:18 GMT
Title: LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
Authors: Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Abstract summary: Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. We present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. We propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem.
Score: 25.188067240126422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various

Related papers

Data Normalization Strategies for EEG Deep Learning [0.0]
We show that optimal normalization strategies differ significantly between training paradigms.<n>Window-level within-channel normalization yields the best performance in supervised tasks.<n>Our findings challenge the assumption that a universal normalization strategy can generalize across learning settings.
arXiv Detail & Related papers (2025-06-15T15:33:41Z)
SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence. We transform the discrete selection problem into a continuous subset optimization framework. We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z)
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting [26.931958430968024]
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information.<n>This paper introduces the first scalable instantiation of this paradigm called ScaleBiO, focusing on bilevel optimization for large-scale LLM data reweighting.<n>By combining with a recently proposed memory-efficient training technique called LISA, our novel algorithm allows the paradigm to scale to $sim$30B-sized LLMs on $8times$H100 GPU.
arXiv Detail & Related papers (2024-06-28T15:03:08Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMs [0.0]
This paper introduces L-Tuning, an efficient fine-tuning approach for classification tasks within the Natural Language Inference (NLI) framework. L-Tuning focuses on the fine-tuning of label tokens processed through a pre-trained Large Language Models (LLMs) Our experimental results indicate a significant improvement in training efficiency and classification accuracy with L-Tuning compared to traditional approaches.
arXiv Detail & Related papers (2023-12-21T01:47:49Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
Class Adaptive Network Calibration [19.80805957502909]
We propose Class Adaptive Label Smoothing (CALS) for calibrating deep networks. Our method builds on a general Augmented Lagrangian approach, a well-established technique in constrained optimization.
arXiv Detail & Related papers (2022-11-28T06:05:31Z)
Evolving Multi-Label Fuzzy Classifier [5.53329677986653]
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time. We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner.
arXiv Detail & Related papers (2022-03-29T08:01:03Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models. Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z)
PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes. We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training. Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)
Exemplar Normalization for Learning Deep Representation [34.42934843556172]
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN) EN is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
arXiv Detail & Related papers (2020-03-19T13:23:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.