Related papers: LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions

LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions

URL: http://arxiv.org/abs/2503.20252v1
Date: Wed, 26 Mar 2025 05:38:45 GMT
Title: LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions
Authors: Yejin Kwon, Daeun Moon, Youngje Oh, Hyunsoo Yoon,
Abstract summary: We introduce LogicQA, a framework that enhances Anomaly Detection (AD)<n> LogicQA compiles automatically generated questions into a checklist and collects responses to identify violations of logical constraints.<n>We achieve state-of-the-art (SOTA) Logical AD performance on public benchmarks, MVTec LOCO AD, with an AUROC of 87.6 percent and an F1-max of 87.0 percent along with the explanations of anomalies.
Score: 4.63822109539229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anomaly Detection (AD) focuses on detecting samples that differ from the standard pattern, making it a vital tool in process control. Logical anomalies may appear visually normal yet violate predefined constraints on object presence, arrangement, or quantity, depending on reasoning and explainability. We introduce LogicQA, a framework that enhances AD by providing industrial operators with explanations for logical anomalies. LogicQA compiles automatically generated questions into a checklist and collects responses to identify violations of logical constraints. LogicQA is training-free, annotation-free, and operates in a few-shot setting. We achieve state-of-the-art (SOTA) Logical AD performance on public benchmarks, MVTec LOCO AD, with an AUROC of 87.6 percent and an F1-max of 87.0 percent along with the explanations of anomalies. Also, our approach has shown outstanding performance on semiconductor SEM corporate data, further validating its effectiveness in industrial applications.

Related papers

SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment [12.388954043805235]
Vision-Language Models (VLMs) often struggle in industrial anomaly detection and reasoning.<n>SAGE is a VLM-based framework that enhances anomaly reasoning through Self-Guided Fact Enhancement (SFE) and Entropy-aware Direct Preference Optimization (E-DPO)<n>SAGE demonstrates superior performance on industrial anomaly datasets under zero-shot and one-shot settings.
arXiv Detail & Related papers (2025-07-10T17:23:42Z)
LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection [27.45348890285863]
We introduce Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning. We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B. Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score.
arXiv Detail & Related papers (2025-04-17T08:41:23Z)
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models [86.83875864328984]
We propose an automated method for synthesizing open-ended logic puzzles, and use it to develop a bilingual benchmark, AutoLogi. Our approach features program-based verification and controllable difficulty levels, enabling more reliable evaluation that better distinguishes models' reasoning abilities.
arXiv Detail & Related papers (2025-02-24T07:02:31Z)
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning [71.41062111470414]
Current object detectors excel at entity localization and classification, yet exhibit inherent limitations in event recognition capabilities.<n>We present a novel framework that expands the capability of standard object detectors beyond mere object recognition to complex event understanding.<n>Our key innovation lies in bridging the semantic gap between object detection and event understanding without requiring expensive task-specific training.
arXiv Detail & Related papers (2025-02-09T10:30:54Z)
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction [4.959108380494595]
Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning.<n>In this work, we investigate using AVLMs for logical anomaly detection and demonstrate that they are well-suited to the task.<n>We achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.4% and F1-max of 83.7%, along with explanations of anomalies.
arXiv Detail & Related papers (2025-01-03T11:40:41Z)
Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents [31.126001253902416]
We present the first study focused on identifying and detecting defects in LLM Agents.<n>We collected and analyzed 6,854 relevant posts from StackOverflow to define 8 types of agent defects.<n>Our results show that Agentable achieved an overall accuracy of 88.79% and a recall rate of 91.03%.
arXiv Detail & Related papers (2024-12-24T11:54:14Z)
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection [5.989778187635765]
LogiCode is a novel framework that leverages Large Language Models (LLMs) for identifying logical anomalies in industrial settings. By harnessing LLMs for logical reasoning, LogiCode autonomously generates Python codes to pinpoint anomalies such as incorrect quantities or missing elements.
arXiv Detail & Related papers (2024-06-07T07:01:06Z)
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection [17.32019706857109]
Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. We propose SAM-LAD, a zero-shot, plug-and-play framework for logical anomaly detection in any scene. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets.
arXiv Detail & Related papers (2024-06-02T06:08:26Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks [54.306234256074255]
We identify the issue of tokenization inconsistency that is commonly neglected in training generative models. This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently. We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets.
arXiv Detail & Related papers (2022-12-19T23:33:21Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Logically Consistent Loss for Visual Question Answering [66.83963844316561]
The current advancement in neural-network based Visual Question Answering (VQA) cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework. Experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance.
arXiv Detail & Related papers (2020-11-19T20:31:05Z)
Conditional Self-Attention for Query-based Summarization [49.616774159367516]
We propose textitconditional self-attention (CSA), a neural network module designed for conditional dependency modeling. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer.
arXiv Detail & Related papers (2020-02-18T02:22:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.