Related papers: RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents

RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents

URL: http://arxiv.org/abs/2410.13384v1
Date: Thu, 17 Oct 2024 09:36:52 GMT
Title: RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents
Authors: Zhuoran Liu, Danpei Zhao, Bo Yuan,
Abstract summary: This paper introduces Adaptive Disaster Interpretation (ADI), a novel task designed to solve requests by planning and executing multiple correlative interpretation tasks. We present a new dataset named RescueADI, which contains high-resolution RSIs with annotations for three connected aspects: planning, perception, and recognition. We propose a new disaster interpretation method employing autonomous agents driven by large language models (LLMs) for task planning and execution.
Score: 11.08910129925713
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current methods for disaster scene interpretation in remote sensing images (RSIs) mostly focus on isolated tasks such as segmentation, detection, or visual question-answering (VQA). However, current interpretation methods often fail at tasks that require the combination of multiple perception methods and specialized tools. To fill this gap, this paper introduces Adaptive Disaster Interpretation (ADI), a novel task designed to solve requests by planning and executing multiple sequentially correlative interpretation tasks to provide a comprehensive analysis of disaster scenes. To facilitate research and application in this area, we present a new dataset named RescueADI, which contains high-resolution RSIs with annotations for three connected aspects: planning, perception, and recognition. The dataset includes 4,044 RSIs, 16,949 semantic masks, 14,483 object bounding boxes, and 13,424 interpretation requests across nine challenging request types. Moreover, we propose a new disaster interpretation method employing autonomous agents driven by large language models (LLMs) for task planning and execution, proving its efficacy in handling complex disaster interpretations. The proposed agent-based method solves various complex interpretation requests such as counting, area calculation, and path-finding without human intervention, which traditional single-task approaches cannot handle effectively. Experimental results on RescueADI demonstrate the feasibility of the proposed task and show that our method achieves an accuracy 9% higher than existing VQA methods, highlighting its advantages over conventional disaster interpretation approaches. The dataset will be publicly available.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning [31.376336808244286]
This paper proposes a novel semantic extractor that can be adopted into any machine learning downstream task. The semantic extractor can be executed onboard which results in a reduction of data that needs to be transmitted to ground stations. Our experimental results demonstrate the proposed method maintains high accuracy across different downstream tasks while significantly reducing the volume of transmitted data.
arXiv Detail & Related papers (2024-12-14T08:56:22Z)
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints [66.85363924364628]
Image forgery localization (IFL) is a crucial technique for preventing tampered image misuse and protecting social safety. We introduce a novel information-theoretic IFL framework named SUMI-IFL that imposes sufficiency-view and minimality-view constraints on forgery feature representation.
arXiv Detail & Related papers (2024-12-13T09:08:02Z)
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant [59.2438504610849]
We introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS) Our method not only provides user-friendly and explainable results but also significantly boosts accuracy and robustness compared to previous methods.
arXiv Detail & Related papers (2024-08-19T15:15:20Z)
SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs [5.961207817077044]
We propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE. Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features. We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling. Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.
arXiv Detail & Related papers (2024-06-30T23:11:20Z)
RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts [21.69333828191263]
We propose a retrieval-augmented generation-based crowdsourcing framework that reimagines task decomposition (TD) as event detection from the perspective of natural language understanding. We present a Prompt-Based Contrastive learning framework for TD (PBCT), which incorporates a prompt-based trigger detector to overcome dependence. Experiment results demonstrate the competitiveness of our method in both supervised and zero-shot detection.
arXiv Detail & Related papers (2024-06-04T08:34:19Z)
Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning [50.88504784466931]
Multi-task dense prediction involves semantic segmentation, depth estimation, and surface normal estimation. Existing solutions typically rely on learning global image representations for global cross-task image matching. Our proposal involves modeling region-wise representations using Gaussian Distributions.
arXiv Detail & Related papers (2024-03-15T12:41:30Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints [56.283944756315066]
We propose an alternative TAMP approach that unifies task and motion planning into a single search. Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI search to yield physically feasible plans.
arXiv Detail & Related papers (2023-12-29T14:00:20Z)
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z)
Sequential Action-Induced Invariant Representation for Reinforcement Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning. We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z)
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z)
Unsupervised and Interpretable Domain Adaptation to Rapidly Filter Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams. We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications. We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.