Related papers: An Investigation of Visual Foundation Models Robustness

An Investigation of Visual Foundation Models Robustness

URL: http://arxiv.org/abs/2508.16225v1
Date: Fri, 22 Aug 2025 08:54:13 GMT
Title: An Investigation of Visual Foundation Models Robustness
Authors: Sandeep Gupta, Roberto Passerone,
Abstract summary: Visual Foundation Models (VFMs) are becoming ubiquitous in computer vision, powering systems for diverse tasks such as object detection, image classification, segmentation, pose estimation, and motion tracking.<n>This article investigates network robustness requirements crucial in computer vision systems to adapt to dynamic environments influenced by factors such as lighting, weather conditions, and sensor characteristics.<n>We examine the prevalent empirical defenses and robust training employed to enhance vision network robustness against real-world challenges such as distributional shifts, noisy and spatially distorted inputs, and adversarial attacks.
Score: 0.18352113484137625
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Foundation Models (VFMs) are becoming ubiquitous in computer vision, powering systems for diverse tasks such as object detection, image classification, segmentation, pose estimation, and motion tracking. VFMs are capitalizing on seminal innovations in deep learning models, such as LeNet-5, AlexNet, ResNet, VGGNet, InceptionNet, DenseNet, YOLO, and ViT, to deliver superior performance across a range of critical computer vision applications. These include security-sensitive domains like biometric verification, autonomous vehicle perception, and medical image analysis, where robustness is essential to fostering trust between technology and the end-users. This article investigates network robustness requirements crucial in computer vision systems to adapt effectively to dynamic environments influenced by factors such as lighting, weather conditions, and sensor characteristics. We examine the prevalent empirical defenses and robust training employed to enhance vision network robustness against real-world challenges such as distributional shifts, noisy and spatially distorted inputs, and adversarial attacks. Subsequently, we provide a comprehensive analysis of the challenges associated with these defense mechanisms, including network properties and components to guide ablation studies and benchmarking metrics to evaluate network robustness.

Related papers

Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z)
Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity [22.491097360752903]
We advocate for a causal learning perspective to advance anomaly detection in spatially distributed infrastructures.<n>We identify and formalize three key directions: causal graph profiling, multi-view fusion, and continual causal graph learning.<n>Our objective is to lay a new research trajectory toward scalable, adaptive, explainable, and spatially grounded anomaly detection systems.
arXiv Detail & Related papers (2025-07-10T21:19:28Z)
A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments [55.60375624503877]
Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data.<n>This survey is motivated by the urgent need to understand how the unique characteristics of cloud, edge, and federated deployments shape attack vectors and defense requirements.<n>We systematically examine the evolution of attack methodologies and defense mechanisms across these environments, demonstrating how environmental factors influence security strategies in critical sectors such as autonomous vehicles, healthcare, and financial services.
arXiv Detail & Related papers (2025-02-22T03:46:50Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks. However, their robustness under natural and adversarial perturbations remains a critical concern. We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z)
A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey [9.861721674777877]
Deep neural network based object detectors are continuously evolving and are used in a multitude of applications. While safety-critical applications need high accuracy and reliability, low-latency tasks need resource and energy-efficient networks. A reference benchmark for existing networks does not exist, nor does a standard evaluation guideline for designing new networks.
arXiv Detail & Related papers (2022-08-23T12:01:16Z)
Robustness in Deep Learning for Computer Vision: Mind the gap? [13.576376492050185]
We identify, analyze, and summarize current definitions and progress towards non-adversarial robustness in deep learning for computer vision. We find that this area of research has received disproportionately little attention relative to adversarial machine learning.
arXiv Detail & Related papers (2021-12-01T16:42:38Z)
SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size [95.00667357120442]
Changing the object location, rotation and size may affect the predictions in non-trivial ways. We perform a fine-grained analysis of robustness with respect to these factors of variation using SI-Score, a synthetic dataset.
arXiv Detail & Related papers (2021-04-09T05:00:49Z)
Unadversarial Examples: Designing Objects for Robust Vision [100.4627585672469]
We develop a framework that exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design "robust objects" We demonstrate the efficacy of the framework on a wide variety of vision-based tasks ranging from standard benchmarks to (in-simulation) robotics.
arXiv Detail & Related papers (2020-12-22T18:26:07Z)
DEEVA: A Deep Learning and IoT Based Computer Vision System to Address Safety and Security of Production Sites in Energy Industry [0.0]
This paper tackles various computer vision related problems such as scene classification, object detection in scenes, semantic segmentation, scene captioning etc. We developed Deep ExxonMobil Eye for Video Analysis (DEEVA) package to handle scene classification, object detection, semantic segmentation and captioning of scenes. The results reveal that transfer learning with the RetinaNet object detector is able to detect the presence of workers, different types of vehicles/construction equipment, safety related objects at a high level of accuracy (above 90%)
arXiv Detail & Related papers (2020-03-02T21:26:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.