Related papers: Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems

Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems

URL: http://arxiv.org/abs/2506.14096v1
Date: Tue, 17 Jun 2025 01:20:50 GMT
Title: Image Segmentation with Large Language Models: A Survey with Perspectives for Intelligent Transportation Systems
Authors: Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma,
Abstract summary: This survey systematically reviews the emerging field of LLM-augmented image segmentation.<n>We highlight how these innovations can enhance road scene understanding for autonomous driving, traffic monitoring, and infrastructure maintenance.<n>We identify key challenges, including real-time performance and safety-critical reliability.
Score: 2.1797343876622097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of Large Language Models (LLMs) with computer vision is profoundly transforming perception tasks like image segmentation. For intelligent transportation systems (ITS), where accurate scene understanding is critical for safety and efficiency, this new paradigm offers unprecedented capabilities. This survey systematically reviews the emerging field of LLM-augmented image segmentation, focusing on its applications, challenges, and future directions within ITS. We provide a taxonomy of current approaches based on their prompting mechanisms and core architectures, and we highlight how these innovations can enhance road scene understanding for autonomous driving, traffic monitoring, and infrastructure maintenance. Finally, we identify key challenges, including real-time performance and safety-critical reliability, and outline a perspective centered on explainable, human-centric AI as a prerequisite for the successful deployment of this technology in next-generation transportation systems.

Related papers

Large Language Models and Their Applications in Roadway Safety and Mobility Enhancement: A Comprehensive Review [14.611584622270405]
This paper reviews the application and customization of Large Language Models (LLMs) for enhancing roadway safety and mobility.<n>A key focus is how LLMs are adapted -- via architectural, training, prompting, and multimodal strategies -- to bridge the "modality gap" with transportation's unique-temporal and physical data.<n>Despite significant potential, challenges persist regarding inherent LLM limitations (hallucinations, reasoning deficits), data governance (privacy, bias complexity), complexities (sim-to-real, latency), and rigorous safety assurance.
arXiv Detail & Related papers (2025-05-19T21:51:18Z)
A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.<n>Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.<n>This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z)
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap [51.198001060683296]
Large Language Models (LLMs) offer transformative potential to address transportation challenges.<n>This survey first presents LLM4TR, a novel conceptual framework that systematically categorizes the roles of LLMs in transportation.<n>For each role, our review spans diverse applications, from traffic prediction and autonomous driving to safety analytics and urban mobility optimization.
arXiv Detail & Related papers (2025-03-27T11:56:27Z)
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey [61.39993881402787]
World models and video generation are pivotal technologies in the domain of autonomous driving. This paper investigates the relationship between these two technologies. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions.
arXiv Detail & Related papers (2024-11-05T08:58:35Z)
GenAI-powered Multi-Agent Paradigm for Smart Urban Mobility: Opportunities and Challenges for Integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) with Intelligent Transportation Systems [10.310791311301962]
This paper explores the transformative potential of large language models (LLMs) and emerging Retrieval-Augmented Generation (RAG) technologies. We propose a conceptual framework aimed at developing multi-agent systems capable of intelligently and conversationally delivering smart mobility services.
arXiv Detail & Related papers (2024-08-31T16:14:42Z)
A Survey of Generative AI for Intelligent Transportation Systems: Road Transportation Perspective [7.770651543578893]
We introduce the principles of different generative AI techniques. We classify tasks in ITS into four types: traffic perception, traffic prediction, traffic simulation, and traffic decision-making. We illustrate how generative AI techniques addresses key issues in these four different types of tasks.
arXiv Detail & Related papers (2023-12-13T16:13:23Z)
Representation Engineering: A Top-Down Approach to AI Transparency [130.33981757928166]
We identify and characterize the emerging area of representation engineering (RepE)<n>RepE places population-level representations, rather than neurons or circuits, at the center of analysis.<n>We showcase how these methods can provide traction on a wide range of safety-relevant problems.
arXiv Detail & Related papers (2023-10-02T17:59:07Z)
Core Challenges in Embodied Vision-Language Planning [11.896110519868545]
Embodied Vision-Language Planning tasks leverage computer vision and natural language for interaction in physical environments. We propose a taxonomy to unify these tasks and provide an analysis and comparison of the current and new algorithmic approaches. We advocate for task construction that enables model generalisability and furthers real-world deployment.
arXiv Detail & Related papers (2023-04-05T20:37:13Z)
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics [77.34726150561087]
This work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles. Concepts and characteristics related to both sensors, as well as to their fusion, are presented. We give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.
arXiv Detail & Related papers (2023-03-08T00:48:32Z)
Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues. We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)
DEEVA: A Deep Learning and IoT Based Computer Vision System to Address Safety and Security of Production Sites in Energy Industry [0.0]
This paper tackles various computer vision related problems such as scene classification, object detection in scenes, semantic segmentation, scene captioning etc. We developed Deep ExxonMobil Eye for Video Analysis (DEEVA) package to handle scene classification, object detection, semantic segmentation and captioning of scenes. The results reveal that transfer learning with the RetinaNet object detector is able to detect the presence of workers, different types of vehicles/construction equipment, safety related objects at a high level of accuracy (above 90%)
arXiv Detail & Related papers (2020-03-02T21:26:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.