Related papers: From Text to Multimodal: A Comprehensive Survey of Adversarial Example Generation in Question Answering Systems

Related papers

Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models [0.0]
This systematic literature review analyzes research published between January 2020 and early 2024 that focuses on the explainability of multimodal models.<n>We find that evaluation methods for XAI in multimodal settings are largely non-systematic, lacking consistency, robustness, and consideration for modality-specific cognitive and contextual factors.
arXiv Detail & Related papers (2025-08-06T13:14:20Z)
Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions [0.37865171120254354]
Stance detection is essential for understanding subjective content across various platforms such as social media, news articles, and online reviews.<n>Recent advances in Large Language Models (LLMs) have revolutionized stance detection by introducing novel capabilities.<n>We present a novel taxonomy for LLM-based stance detection approaches, structured along three key dimensions.<n>Key applications in stance detection, political analysis, public health monitoring, and social media moderation are discussed.
arXiv Detail & Related papers (2025-05-13T11:47:49Z)
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks. Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains. This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z)
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey [124.23247710880008]
multimodal CoT (MCoT) reasoning has recently garnered significant research attention. Existing MCoT studies design various methodologies to address the challenges of image, video, speech, audio, 3D, and structured data. We present the first systematic survey of MCoT reasoning, elucidating the relevant foundational concepts and definitions.
arXiv Detail & Related papers (2025-03-16T18:39:13Z)
A Survey on Image Quality Assessment: Insights, Analysis, and Future Outlook [6.925820483833189]
Image quality assessment (IQA) represents a pivotal challenge in image-focused technologies. IQA has witnessed a notable surge in innovative research efforts, driven by the emergence of novel architectural paradigms. This survey delivers an extensive analysis of contemporary IQA methodologies, organized according to their application scenarios.
arXiv Detail & Related papers (2025-02-12T16:24:22Z)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly. General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities. Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z)
Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity [0.0]
This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis. We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality. A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty.
arXiv Detail & Related papers (2024-08-23T05:40:35Z)
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. High-quality datasets are used to train models on realistic scenarios. Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z)
A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions [1.1458366773578277]
Research interest in autonomous agents is on the rise as an emerging topic. The challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments. Context awareness emerges as a pivotal element in fortifying multi-agent systems.
arXiv Detail & Related papers (2024-02-03T00:27:22Z)
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation. We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence. We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions. We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z)
How Many Answers Should I Give? An Empirical Study of Multi-Answer Reading Comprehension [64.76737510530184]
We design a taxonomy to categorize commonly-seen multi-answer MRC instances. We analyze how well different paradigms of current multi-answer MRC models deal with different types of multi-answer instances.
arXiv Detail & Related papers (2023-06-01T08:22:21Z)
Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective [69.25513235556635]
Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans. Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system. We propose a unified mathematical framework to covering existing attack paradigms.
arXiv Detail & Related papers (2023-02-19T02:12:21Z)
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions [68.6358773622615]
This paper provides an overview of the computational and theoretical foundations of multimodal machine learning. We propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches.
arXiv Detail & Related papers (2022-09-07T19:21:19Z)
Modeling Transformative AI Risks (MTAIR) Project -- Summary Report [0.0]
This report builds on an earlier diagram by Cottier and Shah which laid out some of the crucial disagreements ("cruxes") visually, with some explanation. The model starts with a discussion of reasoning via analogies and general prior beliefs about artificial intelligence. It lays out a model of different paths and enabling technologies for high-level machine intelligence, and a model of how advances in the capabilities of these systems might proceed. The model also looks specifically at the question of learned optimization, and whether machine learning systems will create mesa-optimizers.
arXiv Detail & Related papers (2022-06-19T09:11:23Z)
Conversational Question Answering: A Survey [18.447856993867788]
This survey is an effort to present a comprehensive review of the state-of-the-art research trends of Conversational Question Answering (CQA) Our findings show that there has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives.
arXiv Detail & Related papers (2021-06-02T01:06:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.