CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems
- URL: http://arxiv.org/abs/2601.09613v1
- Date: Wed, 14 Jan 2026 16:36:26 GMT
- Title: CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems
- Authors: Yonglin Tian, Qiyao Zhang, Wei Xu, Yutong Wang, Yihao Wu, Xinyi Li, Xingyuan Dai, Hui Zhang, Zhiyong Cui, Baoqing Guo, Zujun Yu, Yisheng Lv,
- Abstract summary: We introduce a novel benchmark, CogRail, which integrates curated datasets with cognitively driven question-answer annotations.<n>Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models.<n>We propose a joint fine-tuning framework that integrates three core tasks, position perception, movement prediction, and threat analysis.
- Score: 29.385460126069386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and early perception of potential intrusion targets is essential for ensuring the safety of railway transportation systems. However, most existing systems focus narrowly on object classification within fixed visual scopes and apply rule-based heuristics to determine intrusion status, often overlooking targets that pose latent intrusion risks. Anticipating such risks requires the cognition of spatial context and temporal dynamics for the object of interest (OOI), which presents challenges for conventional visual models. To facilitate deep intrusion perception, we introduce a novel benchmark, CogRail, which integrates curated open-source datasets with cognitively driven question-answer annotations to support spatio-temporal reasoning and prediction. Building upon this benchmark, we conduct a systematic evaluation of state-of-the-art visual-language models (VLMs) using multimodal prompts to identify their strengths and limitations in this domain. Furthermore, we fine-tune VLMs for better performance and propose a joint fine-tuning framework that integrates three core tasks, position perception, movement prediction, and threat analysis, facilitating effective adaptation of general-purpose foundation models into specialized models tailored for cognitive intrusion perception. Extensive experiments reveal that current large-scale multimodal models struggle with the complex spatial-temporal reasoning required by the cognitive intrusion perception task, underscoring the limitations of existing foundation models in this safety-critical domain. In contrast, our proposed joint fine-tuning framework significantly enhances model performance by enabling targeted adaptation to domain-specific reasoning demands, highlighting the advantages of structured multi-task learning in improving both accuracy and interpretability. Code will be available at https://github.com/Hub-Tian/CogRail.
Related papers
- OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z) - Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Attention Augmented GNN RNN-Attention Models for Advanced Cybersecurity Intrusion Detection [0.4369550829556577]
We propose a novel hybrid deep learning architecture that synergistically combines Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs) and multi-head attention mechanisms.<n>Our approach effectively captures both spatial dependencies through graph structural relationships and sequential analysis of network events.<n>The integrated attention mechanism provides dual benefits of improved model interpretability and enhanced feature selection, enabling cybersecurity analysts to focus computational resources on high-impact security events.
arXiv Detail & Related papers (2025-10-29T03:47:02Z) - DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios [57.327907850766785]
characterization of deception across realistic real-world scenarios remains underexplored.<n>We establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different domains.<n>On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement.<n>We incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics.
arXiv Detail & Related papers (2025-10-17T10:14:26Z) - Foundation Models for Autonomous Driving Perception: A Survey Through Core Capabilities [0.6445605125467574]
Foundation models are revolutionizing autonomous driving perception, transitioning the field from narrow, task-specific deep learning models to versatile, general-purpose architectures trained on vast, diverse datasets.<n>This survey examines how these models address critical challenges in autonomous perception, including limitations in generalization, scalability, and robustness to distributional shifts.
arXiv Detail & Related papers (2025-09-10T05:45:49Z) - Bayesian and Multi-Objective Decision Support for Real-Time Cyber-Physical Incident Mitigation [6.852472228194646]
This research proposes a real-time, adaptive decision-support framework for mitigating cyber incidents in cyber-physical systems.<n>It is developed in response to an increasing reliance on these systems within critical infrastructure and evolving adversarial tactics.
arXiv Detail & Related papers (2025-08-31T09:47:38Z) - Rethinking Spatio-Temporal Anomaly Detection: A Vision for Causality-Driven Cybersecurity [22.491097360752903]
We advocate for a causal learning perspective to advance anomaly detection in spatially distributed infrastructures.<n>We identify and formalize three key directions: causal graph profiling, multi-view fusion, and continual causal graph learning.<n>Our objective is to lay a new research trajectory toward scalable, adaptive, explainable, and spatially grounded anomaly detection systems.
arXiv Detail & Related papers (2025-07-10T21:19:28Z) - Offline Model-Based Optimization: Comprehensive Review [61.91350077539443]
offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets.<n>Recent advances in model-based optimization have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models.<n>Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review.
arXiv Detail & Related papers (2025-03-21T16:35:02Z) - FACADE: A Framework for Adversarial Circuit Anomaly Detection and
Evaluation [9.025997629442896]
FACADE is designed for unsupervised mechanistic anomaly detection in deep neural networks.
Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
arXiv Detail & Related papers (2023-07-20T04:00:37Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.