Related papers: LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

URL: http://arxiv.org/abs/2505.18884v1
Date: Sat, 24 May 2025 21:54:52 GMT
Title: LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
Authors: Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli,
Abstract summary: We propose Lagrangian-d Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework.<n>LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy.
Score: 11.01163097340578
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.

Related papers

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z)
Comparing and correcting robustness metrics for quantum optimal control [1.6927349660459692]
We present a novel, systematic study demonstrating important numerical differences between adjoint end-point and toggling-frame approaches.<n>We also introduce a critical discretization correction to the widely-used robustness-frame estimator.<n>Our approach uniquely handles control and fidelity constraints while cleanly isolating robustness for dedicated optimization.
arXiv Detail & Related papers (2026-02-10T22:44:16Z)
Leveraging Second-Order Curvature for Efficient Learned Image Compression: Theory and Empirical Evidence [13.56541419560425]
We show that a second-order quasi-Newton, textSOAP, dramatically improves both training efficiency and final performance across diverse LICs.<n>We uncover a critical deployability benefit: second-order trained models exhibit significantly fewer activation and latent robustnesss.
arXiv Detail & Related papers (2026-01-28T16:56:52Z)
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding [54.05243949024302]
Existing robust MLLMs rely on implicit training/adaptation that focuses solely on visual encoder generalization.<n>We propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains.<n>Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity.
arXiv Detail & Related papers (2025-12-19T12:56:17Z)
ROOT: Robust Orthogonalized Optimizer for Neural Network Training [47.05662448082334]
Large language models (LLMs) remain a critical challenge, particularly as model scaling exacerbates sensitivity to imprecision and training instability.<n>We develop a dimension-robustization scheme that enhances robustness through iterations tailored to specific matrix sizes.<n>Second, we introduce an optimization-robustization framework that suppresses outliers noise while preserving meaningful directions.
arXiv Detail & Related papers (2025-11-25T18:48:05Z)
OFMU: Optimization-Driven Framework for Machine Unlearning [5.100622189286672]
Large language models increasingly require the ability to unlearn specific knowledge, such as user requests, copyrighted materials, or outdated information.<n>We propose OFMU, a penalty-based bi-level optimization framework that explicitly prioritizes forgetting while preserving retention.<n>We show that OFMU consistently outperforms existing unlearning methods in both efficacy and retained utility.
arXiv Detail & Related papers (2025-09-26T15:31:32Z)
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models [7.566515311806724]
Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information.<n>Existing unlearning methods formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss.<n>We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss.
arXiv Detail & Related papers (2025-06-05T17:55:23Z)
Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning [12.927397451723719]
Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data.<n>It faces two key challenges: the lack of performance guarantees for the transferred policy, and the risk of negative transfer when multiple source domains are involved.<n>We propose a novel framework based on the pessimism principle, which constructs and optimize a conservative estimation of the target domain's performance.
arXiv Detail & Related papers (2025-05-24T01:09:10Z)
Adversarial Robustness for Unified Multi-Modal Encoders via Efficient Calibration [12.763688592842717]
We present the first comprehensive study of adversarial vulnerability in unified multi-modal encoders.<n>Non-visual inputs, such as audio and point clouds, are especially fragile.<n>Our method improves adversarial robustness by up to 47.3 percent at epsilon = 4/255.
arXiv Detail & Related papers (2025-05-17T08:26:04Z)
E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models [15.270657838960114]
Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling.<n>We present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises.<n>Our method achieves substantial performance gains in terms of Fr'eche't Inception Distance (FID) and CLIP score, even with fewer sampling steps.
arXiv Detail & Related papers (2024-12-30T16:06:31Z)
Source-Free Domain Adaptive Object Detection with Semantics Compensation [54.00183496587841]
We introduce Weak-to-strong Semantics Compensation (WSCo) for strong data augmentation.<n>WSCo compensates for the class-relevant semantics that may be lost during strong augmentation on the fly.<n>WSCo can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines.
arXiv Detail & Related papers (2024-10-07T23:32:06Z)
Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition [52.89441679581216]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise.<n>We present an innovative video decomposition strategy that incorporates view-independent and view-dependent components.<n>Our framework consistently outperforms existing methods, establishing a new SOTA performance.
arXiv Detail & Related papers (2024-05-24T15:56:40Z)
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z)
Gradient constrained sharpness-aware prompt learning for vision-language models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM) By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness. We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z)
Enhancing Infrared Small Target Detection Robustness with Bi-Level Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions. Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? [99.4914671654374]
We propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency.
arXiv Detail & Related papers (2021-11-01T17:59:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.