Related papers: VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL

VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL

URL: http://arxiv.org/abs/2507.17896v1
Date: Wed, 23 Jul 2025 19:48:12 GMT
Title: VeriMinder: Mitigating Analytical Vulnerabilities in NL2SQL
Authors: Shubham Mohole, Sainyam Galhotra,
Abstract summary: Application systems using natural language interfaces to databases (NLIDBs) have democratized data analysis.<n>This has also brought forth an urgent challenge to help users who might use these systems without a background in statistical analysis.<n>We present VeriMinder, https://veriminder.ai, an interactive system for detecting and mitigating such analytical vulnerabilities.
Score: 11.830097026198308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Application systems using natural language interfaces to databases (NLIDBs) have democratized data analysis. This positive development has also brought forth an urgent challenge to help users who might use these systems without a background in statistical analysis to formulate bias-free analytical questions. Although significant research has focused on text-to-SQL generation accuracy, addressing cognitive biases in analytical questions remains underexplored. We present VeriMinder, https://veriminder.ai, an interactive system for detecting and mitigating such analytical vulnerabilities. Our approach introduces three key innovations: (1) a contextual semantic mapping framework for biases relevant to specific analysis contexts (2) an analytical framework that operationalizes the Hard-to-Vary principle and guides users in systematic data analysis (3) an optimized LLM-powered system that generates high-quality, task-specific prompts using a structured process involving multiple candidates, critic feedback, and self-reflection. User testing confirms the merits of our approach. In direct user experience evaluation, 82.5% participants reported positively impacting the quality of the analysis. In comparative evaluation, VeriMinder scored significantly higher than alternative approaches, at least 20% better when considered for metrics of the analysis's concreteness, comprehensiveness, and accuracy. Our system, implemented as a web application, is set to help users avoid "wrong question" vulnerability during data analysis. VeriMinder code base with prompts, https://reproducibility.link/veriminder, is available as an MIT-licensed open-source software to facilitate further research and adoption within the community.

Related papers

Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System [0.0]
Large language models (LLMs) offer promising opportunities for organizational research.<n>Their built-in moderation systems can create problems when researchers try to analyze harmful content.<n>This paper introduces an Elo rating-based method that significantly improves LLM performance for harmful content analysis.
arXiv Detail & Related papers (2025-06-19T20:01:12Z)
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents [30.54944324418407]
VIDEE is a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents.<n>We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors.
arXiv Detail & Related papers (2025-06-17T05:24:58Z)
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [101.78963920333342]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 9 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z)
Towards Automated Situation Awareness: A RAG-Based Framework for Peacebuilding Reports [2.230742111425553]
This paper introduces a dynamic Retrieval-Augmented Generation (RAG) system that autonomously generates situation awareness reports.<n>Our system constructs query-specific knowledge bases on demand, ensuring timely, relevant, and accurate insights.<n>The system is tested across multiple real-world scenarios, demonstrating its effectiveness in producing coherent, insightful, and actionable reports.
arXiv Detail & Related papers (2025-05-14T16:36:30Z)
LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights [12.424610893030353]
Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection.<n>This paper provides a detailed survey of LLMs in vulnerability detection.<n>We address challenges such as cross-language vulnerability detection, multimodal data integration, and repository-level analysis.
arXiv Detail & Related papers (2025-02-10T21:33:38Z)
The Role of Accuracy and Validation Effectiveness in Conversational Business Analytics [0.0]
This study examines conversational business analytics, an approach that utilizes AI to address the technical competency gaps that hinder end users effectively using traditional self-service analytics. By facilitating natural language interactions, conversational business analytics aims to empower users to independently retrieve data and generate insights.
arXiv Detail & Related papers (2024-11-18T23:58:24Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.<n>It consists of 100 datasets representing diverse business use cases such as finance and incident management.<n>Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks [95.29345070102045]
In this paper, we focus our investigation on social bias detection of dialog safety problems. We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically. We introduce CDail-Bias dataset that is the first well-annotated Chinese social bias dialog dataset.
arXiv Detail & Related papers (2022-02-16T11:59:29Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.