A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications
- URL: http://arxiv.org/abs/2511.03363v1
- Date: Wed, 05 Nov 2025 11:08:08 GMT
- Title: A Modular, Data-Free Pipeline for Multi-Label Intention Recognition in Transportation Agentic AI Applications
- Authors: Xiaocai Zhang, Hur Lim, Ke Wang, Zhe Xiao, Jing Wang, Kelvin Lee, Xiuju Fu, Zheng Qin,
- Abstract summary: A modular, data-free pipeline for multi-label intention recognition is proposed for agentic AI applications in transportation.<n>Unlike traditional intent recognition systems that depend on large, annotated corpora, our approach eliminates the need for costly data collection.<n>Our system seamlessly routes user queries to task-specific modules, laying the groundwork for fully autonomous, intention-aware agents.
- Score: 12.25149118082394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, a modular, data-free pipeline for multi-label intention recognition is proposed for agentic AI applications in transportation. Unlike traditional intent recognition systems that depend on large, annotated corpora and often struggle with fine-grained, multi-label discrimination, our approach eliminates the need for costly data collection while enhancing the accuracy of multi-label intention understanding. Specifically, the overall pipeline, named DMTC, consists of three steps: 1) using prompt engineering to guide large language models (LLMs) to generate diverse synthetic queries in different transport scenarios; 2) encoding each textual query with a Sentence-T5 model to obtain compact semantic embeddings; 3) training a lightweight classifier using a novel online focal-contrastive (OFC) loss that emphasizes hard samples and maximizes inter-class separability. The applicability of the proposed pipeline is demonstrated in an agentic AI application in the maritime transportation context. Extensive experiments show that DMTC achieves a Hamming loss of 5.35% and an AUC of 95.92%, outperforming state-of-the-art multi-label classifiers and recent end-to-end SOTA LLM-based baselines. Further analysis reveals that Sentence-T5 embeddings improve subset accuracy by at least 3.29% over alternative encoders, and integrating the OFC loss yields an additional 0.98% gain compared to standard contrastive objectives. In conclusion, our system seamlessly routes user queries to task-specific modules (e.g., ETA information, traffic risk evaluation, and other typical scenarios in the transportation domain), laying the groundwork for fully autonomous, intention-aware agents without costly manual labelling.
Related papers
- Beyond Quantity: Trajectory Diversity Scaling for Code Agents [51.71414642763219]
Trajectory Diversity Scaling is a data synthesis framework for code agents that scales performance through diversity rather than raw volume.<n> TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; and (3) an adaptive evolution mechanism that steers toward long-tail scenarios.
arXiv Detail & Related papers (2026-02-03T07:43:03Z) - Generalizable IoT Traffic Representations for Cross-Network Device Identification [15.867734233278568]
We study the problem of learning generalizable traffic representations for IoT device identification.<n>We design compact encoder architectures that learn per-flow embeddings from unlabeled IoT traffic.<n>We show that these learned representations can be used effectively for IoT device-type classification.
arXiv Detail & Related papers (2026-01-27T07:56:31Z) - SERM: Self-Evolving Relevance Model with Agent-Driven Learning from Massive Query Streams [53.78257200138774]
We propose a Self-Evolving Relevance Model approach (SERM), which comprises two complementary multi-agent modules.<n>We evaluate SERM in a large-scale industrial setting, which serves billions of user requests daily.
arXiv Detail & Related papers (2026-01-14T14:31:16Z) - Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation [8.704530773510411]
This study presents a multi-agent framework that automates data narration and energy insight generation.<n>The framework coordinates three specialized agents, including a data narration agent, an LLM-as-a-judge agent, and an optional human-in-the-loop evaluator.<n>The system is validated through a real-world case study on public bus transportation in Northern Jutland, Denmark.
arXiv Detail & Related papers (2025-11-17T15:14:17Z) - Multi-Model Synthetic Training for Mission-Critical Small Language Models [0.0]
We present a novel approach that achieves a 261x cost reduction for maritime intelligence.<n>Our method transforms 3.2 billion Automatic Identification System (AIS) vessel tracking records into 21,543 synthetic question and answer pairs.<n>The resulting fine-tuned Qwen2.5-7B model achieves 75% accuracy on maritime tasks, while being substantially cheaper than using a larger model for inference.
arXiv Detail & Related papers (2025-09-16T13:04:48Z) - D-CAT: Decoupled Cross-Attention Transfer between Sensor Modalities for Unimodal Inference [3.6344649347926326]
Cross-modal transfer learning is used to improve multi-modal classification models.<n>Existing methods require paired sensor data at both training and inference.<n>We propose Decoupled Cross-Attention Transfer (D-CAT), a framework that aligns modality-specific representations without requiring joint sensor modality during inference.
arXiv Detail & Related papers (2025-09-11T10:54:07Z) - Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework [9.925474085627275]
Traversability estimation is critical for enabling robots to navigate across diverse terrains and environments.<n>We propose a multimodal self-supervised framework for traversability labeling and estimation.<n>Our approach consistently achieves around 88% IoU across diverse datasets.
arXiv Detail & Related papers (2025-08-25T17:40:16Z) - INFNet: A Task-aware Information Flow Network for Large-Scale Recommendation Systems [8.283354901677692]
Information Flow Network (INFNet) is a task-aware architecture designed for large-scale recommendation scenarios.<n>INFNet distinguishes features into three token types, categorical tokens, sequence tokens, and task tokens, and introduces a novel dual-flow design.<n>INFNet has been successfully deployed in a commercial online advertising system, yielding significant gains of +1.587% in Revenue (REV) and +1.155% in Click-Through Rate (CTR)
arXiv Detail & Related papers (2025-08-15T16:18:32Z) - MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z) - Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning [71.3533541927459]
We propose a novel data selection paradigm termed Activation Reasoning Potential (RAP)<n>RAP identifies cognitive samples by estimating each sample's potential to stimulate genuine multi-modal reasoning.<n>Our RAP method consistently achieves superior performance using only 9.3% of the training data, while reducing computational costs by over 43%.
arXiv Detail & Related papers (2025-06-05T08:40:24Z) - Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning [67.06363342414397]
Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages.<n>Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation.<n>We propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance.
arXiv Detail & Related papers (2025-05-26T13:06:18Z) - From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning [71.41062111470414]
Current object detectors excel at entity localization and classification, yet exhibit inherent limitations in event recognition capabilities.<n>We present a novel framework that expands the capability of standard object detectors beyond mere object recognition to complex event understanding.<n>Our key innovation lies in bridging the semantic gap between object detection and event understanding without requiring expensive task-specific training.
arXiv Detail & Related papers (2025-02-09T10:30:54Z) - Transformer-based Self-supervised Multimodal Representation Learning for
Wearable Emotion Recognition [2.4364387374267427]
We propose a novel self-supervised learning (SSL) framework for wearable emotion recognition.
Our method achieved state-of-the-art results in various emotion classification tasks.
arXiv Detail & Related papers (2023-03-29T19:45:55Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.