Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
- URL: http://arxiv.org/abs/2407.08109v1
- Date: Thu, 11 Jul 2024 01:03:02 GMT
- Title: Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
- Authors: Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang,
- Abstract summary: Urban waterlogging poses a major risk to public safety and infrastructure.
Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions.
We establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications.
- Score: 10.001964627074704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications. We propose a Large-Small Model co-adapter paradigm (LSM-adapter), which harnesses the substantial generic segmentation potential of large model and the specific task-directed guidance of small model. Specifically, a Triple-S Prompt Adapter module alongside a Dynamic Prompt Combiner are proposed to generate then merge multiple prompts for mask decoder adaptation. Meanwhile, a Histogram Equalization Adap-ter module is designed to infuse the image specific information for image encoder adaptation. Results and analysis show the challenge and superiority of our developed benchmark and algorithm. Project page: \url{https://github.com/zhang-chenxu/LSM-Adapter}
Related papers
- RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation [14.203806360052567]
Referring ImageHide (RIS) aims to segment specific objects based on natural language descriptions.<n>Existing datasets and methods are typically designed for high-altitude and static-view imagery.<n>We present RIS-LAD, the first fine-grained RIS benchmark tailored for Low-Altitude Drone (LAD) scenarios.
arXiv Detail & Related papers (2025-07-28T15:21:03Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations [27.391859339238906]
In this paper, we explore the potential of generative diffusion model in synthesizing hyperspectral images with pixel-level annotations.
To the best of our knowledge, it is the first work to generate high-dimensional HSIs with annotations.
We select two of the most widely used dense prediction tasks: semantic segmentation and change detection, and generate datasets suitable for these tasks.
arXiv Detail & Related papers (2025-02-24T11:13:37Z) - UrbanSAM: Learning Invariance-Inspired Adapters for Segment Anything Models in Urban Construction [51.54946346023673]
Urban morphology is inherently complex, with irregular objects of diverse shapes and varying scales.
The Segment Anything Model (SAM) has shown significant potential in segmenting complex scenes.
We propose UrbanSAM, a customized version of SAM specifically designed to analyze complex urban environments.
arXiv Detail & Related papers (2025-02-21T04:25:19Z) - Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach [11.770137653756697]
Integrated sensing and communication (ISAC) technology is essential for supporting vehicular networks.
The communication channel in this scenario exhibits time variations, and the potential targets may move rapidly, resulting in double dynamics.
We propose using constrained deep reinforcement learning to facilitate dynamic updates to the ISAC precoder.
arXiv Detail & Related papers (2024-05-23T09:19:14Z) - MiniMaxAD: A Lightweight Autoencoder for Feature-Rich Anomaly Detection [1.7234530131333607]
MiniMaxAD is a lightweight autoencoder designed to efficiently compress and memorize extensive information from normal images.
Our model employs a technique that enhances feature diversity, thereby increasing the effective capacity of the network.
In our methodology, any dataset can be unified under the framework of feature-rich anomaly detection.
arXiv Detail & Related papers (2024-05-16T09:37:54Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Adaptive Sparse Convolutional Networks with Global Context Enhancement
for Faster Object Detection on Drone Images [26.51970603200391]
This paper investigates optimizing the detection head based on the sparse convolution.
It suffers from inadequate integration of contextual information of tiny objects.
We propose a novel global context-enhanced adaptive sparse convolutional network.
arXiv Detail & Related papers (2023-03-25T14:42:50Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - Towards Robust Semantic Segmentation of Accident Scenes via Multi-Source
Mixed Sampling and Meta-Learning [29.74171323437029]
We propose a Multi-source Meta-learning Unsupervised Domain Adaptation framework, to improve the generalization of segmentation transformers to extreme accident scenes.
Our approach achieves a mIoU score of 46.97% on the DADA-seg benchmark, surpassing the previous state-of-the-art model by more than 7.50%.
arXiv Detail & Related papers (2022-03-19T21:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.