Related papers: Think First, Assign Next (ThiFAN-VQA): A Two-stage Chain-of-Thought Framework for Post-Disaster Damage Assessment

Think First, Assign Next (ThiFAN-VQA): A Two-stage Chain-of-Thought Framework for Post-Disaster Damage Assessment

URL: http://arxiv.org/abs/2511.19557v1
Date: Mon, 24 Nov 2025 14:32:07 GMT
Title: Think First, Assign Next (ThiFAN-VQA): A Two-stage Chain-of-Thought Framework for Post-Disaster Damage Assessment
Authors: Ehsan Karimi, Nhut Le, Maryam Rahnemoonfar,
Abstract summary: We propose ThiFAN-VQA, a two-stage reasoning-based framework for visual question answering (VQA) in disaster scenarios.<n>By integrating a custom information retrieval system, domain-specific prompting, and reasoning-guided answer selection, ThiFAN-VQA bridges the gap between zero-shot and supervised methods.<n> Experiments on FloodNet and RescueNet-VQA, UAV-based datasets from flood- and hurricane-affected regions, demonstrate that ThiFAN-VQA achieves superior accuracy, interpretability, and adaptability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Timely and accurate assessment of damages following natural disasters is essential for effective emergency response and recovery. Recent AI-based frameworks have been developed to analyze large volumes of aerial imagery collected by Unmanned Aerial Vehicles, providing actionable insights rapidly. However, creating and annotating data for training these models is costly and time-consuming, resulting in datasets that are limited in size and diversity. Furthermore, most existing approaches rely on traditional classification-based frameworks with fixed answer spaces, restricting their ability to provide new information without additional data collection or model retraining. Using pre-trained generative models built on in-context learning (ICL) allows for flexible and open-ended answer spaces. However, these models often generate hallucinated outputs or produce generic responses that lack domain-specific relevance. To address these limitations, we propose ThiFAN-VQA, a two-stage reasoning-based framework for visual question answering (VQA) in disaster scenarios. ThiFAN-VQA first generates structured reasoning traces using chain-of-thought (CoT) prompting and ICL to enable interpretable reasoning under limited supervision. A subsequent answer selection module evaluates the generated responses and assigns the most coherent and contextually accurate answer, effectively improve the model performance. By integrating a custom information retrieval system, domain-specific prompting, and reasoning-guided answer selection, ThiFAN-VQA bridges the gap between zero-shot and supervised methods, combining flexibility with consistency. Experiments on FloodNet and RescueNet-VQA, UAV-based datasets from flood- and hurricane-affected regions, demonstrate that ThiFAN-VQA achieves superior accuracy, interpretability, and adaptability for real-world post-disaster damage assessment tasks.

Related papers

Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z)
KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z)
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z)
Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models [36.38859440184592]
Video anomaly detection (VAD) plays a vital role in real-world applications such as security surveillance, autonomous driving, and industrial monitoring.<n>Recent advances in large pre-trained models have opened new opportunities for training-free VAD by leveraging rich prior knowledge and general reasoning capabilities.<n>This raises a fundamental question: Is dense reasoning truly necessary when using powerful pre-trained models in VAD systems?<n>We propose ReCoVAD, a novel framework inspired by the dual reflex and conscious pathways of the human nervous system, enabling selective frame processing to reduce redundant computation.
arXiv Detail & Related papers (2025-11-21T09:50:21Z)
Did Models Sufficient Learn? Attribution-Guided Training via Subset-Selected Counterfactual Augmentation [61.248535801314375]
Subset-Selected Counterfactual Augmentation (SS-CA)<n>We develop Counterfactual LIMA to identify minimal spatial region sets whose removal can selectively alter model predictions.<n>Experiments show that SS-CA improves generalization on in-distribution (ID) test data and achieves superior performance on out-of-distribution (OOD) benchmarks.
arXiv Detail & Related papers (2025-11-15T08:39:22Z)
STRATA-TS: Selective Knowledge Transfer for Urban Time Series Forecasting with Retrieval-Guided Reasoning [27.775793400546345]
STRATA-TS is a framework that combines domain-adapted retrieval with reasoning-capable large models to improve forecasting in scarce data regimes.<n>To enable efficient deployment, we distill the reasoning process into a compact open model via supervised fine-tuning.<n>Experiments on three parking availability datasets across Singapore, Nottingham, and Glasgow demonstrate that STRATA-TS consistently outperforms strong forecasting and transfer baselines.
arXiv Detail & Related papers (2025-08-26T03:18:53Z)
ZeShot-VQA: Zero-Shot Visual Question Answering Framework with Answer Mapping for Natural Disaster Damage Assessment [0.0]
Recently published models do not possess the ability to answer open-ended questions.<n>ZeShot-VQA is able to process and generate answers that has been not seen during the training procedure.
arXiv Detail & Related papers (2025-05-30T21:15:11Z)
Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA. We first augment the existing data via deliberate perturbations on either the image or question. We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z)
Retrieval-guided Counterfactual Generation for QA [5.434621727606356]
We focus on the task of creating counterfactuals for question answering. We develop a Retrieve-Generate-Filter technique to create counterfactual evaluation and training data. We find that RGF data leads to significant improvements in a model's robustness to local perturbations.
arXiv Detail & Related papers (2021-10-14T17:56:37Z)
Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system. We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries. Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.