Related papers: PASS-FC: Progressive and Adaptive Search Scheme for Fact Checking of Comprehensive Claims

Related papers

LLM-Assisted Cheating Detection in Korean Language via Keystrokes [1.9344365651682767]
This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean.<n>Our dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses.
arXiv Detail & Related papers (2025-07-29T20:59:03Z)
The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora [6.594531626178451]
Cross-lingual retrieval-augmented generation (RAG) is a critical capability for retrieving and generating answers across languages.<n>We study Arabic-English RAG in a domain-specific setting using benchmarks derived from real-world corporate datasets.<n>We propose a simple retrieval strategy that addresses this source of failure by enforcing equal retrieval from both languages.
arXiv Detail & Related papers (2025-07-10T08:38:31Z)
Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite [8.325455397285873]
Empirical evaluation of state-of-the-art natural-language (NL) to temporal-logic (TL) translation systems reveals near-perfect performance on existing benchmarks.<n>We introduce the Verifiable Linear Temporal Logic Benchmark (VLTL-Bench), a unifying benchmark that measures verification and verifiability of automated NL-to-LTL translation.
arXiv Detail & Related papers (2025-07-01T15:41:57Z)
Search Arena: Analyzing Search-Augmented LLMs [61.28673331156436]
We introduce Search Arena, a crowd-sourced, large-scale, human-preference dataset of over 24,000 paired multi-turn user interactions.<n>The dataset spans diverse intents and languages, and contains full system traces with around 12,000 human preference votes.<n>Our analysis reveals that user preferences are influenced by the number of citations, even when the cited content does not directly support the attributed claims.
arXiv Detail & Related papers (2025-06-05T17:59:26Z)
Fine-Tuning Large Language Models and Evaluating Retrieval Methods for Improved Question Answering on Building Codes [0.0]
Building codes are regulations that establish standards for the design, construction, and safety of buildings to ensure structural integrity, fire protection, and accessibility.<n>Key difficulties include navigating large volumes of text, interpreting technical language, and identifying relevant clauses across different sections.<n>A potential solution is to build a Question-Answering (QA) system that answers user queries based on building codes.<n>Among the various methods for building a QA system, Retrieval-Augmented Generation (RAG) stands out in performance.
arXiv Detail & Related papers (2025-05-07T05:04:30Z)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models [10.663446796160567]
Hallucinations in generative AI, particularly in Large Language Models (LLMs), pose a significant challenge to the reliability of multilingual applications.<n>Existing benchmarks for hallucination detection focus primarily on English and a few widely spoken languages.<n>We introduce Poly-FEVER, a large-scale multilingual fact verification benchmark.
arXiv Detail & Related papers (2025-03-19T01:46:09Z)
AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification [25.27444694706659]
We present AskToAct, which exploits structural mapping between queries and their tool invocation solutions.<n>Our key insight is that tool parameters naturally represent explicit user intents.<n>By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data.
arXiv Detail & Related papers (2025-03-03T12:55:49Z)
First Token Probability Guided RAG for Telecom Question Answering [15.854941373238226]
Retrieval-Augmented Generation (RAG) has shown a distinct advantage in incorporating domain-specific information into Large Language Models (LLMs)<n>We propose a novel first token probability guided RAG framework to tackle the challenges of Multiple Choice Question Answering (MCQA) in telecommunications.
arXiv Detail & Related papers (2025-01-11T07:47:31Z)
Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability [19.722009684115434]
Retrieve-augmented generation (RAG) frameworks have emerged as a promising solution to multi-hop question answering(QA) tasks.<n>Existing RAG frameworks, which usually follows the retrieve-then-read paradigm, often struggle with multi-hop QA with temporal information.<n>This paper proposes a novel framework called review-then-refine, which aims to enhance LLM performance in multi-hop QA scenarios with temporal information.
arXiv Detail & Related papers (2024-12-19T17:48:23Z)
DeepNote: Note-Centric Deep Retrieval-Augmented Generation [72.70046559930555]
Retrieval-Augmented Generation (RAG) mitigates factual errors and hallucinations in Large Language Models (LLMs) for question-answering (QA)<n>We develop DeepNote, an adaptive RAG framework that achieves in-depth and robust exploration of knowledge sources through note-centric adaptive retrieval.
arXiv Detail & Related papers (2024-10-11T14:03:29Z)
SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance. We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination. We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z)
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels. Unclear validation protocol for DA has led to bad practices in the literature. We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z)
Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models. Recent research has sought to leverage large language models (LLMs) to improve IR systems. We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z)
Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM) MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document. To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z)
Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language. To collect large-scale CLS data, existing datasets typically involve translation in their creation. In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z)
On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments. We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z)
Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation [100.69458267888962]
Face presentation attack detection (fPAD) plays a critical role in the modern face recognition pipeline. Due to legal and privacy issues, training data (real face images and spoof images) are not allowed to be directly shared between different data sources. We propose a Federated Test-Time Adaptive Face Presentation Attack Detection with Dual-Phase Privacy Preservation framework.
arXiv Detail & Related papers (2021-10-25T02:51:05Z)
Retrieval-guided Counterfactual Generation for QA [5.434621727606356]
We focus on the task of creating counterfactuals for question answering. We develop a Retrieve-Generate-Filter technique to create counterfactual evaluation and training data. We find that RGF data leads to significant improvements in a model's robustness to local perturbations.
arXiv Detail & Related papers (2021-10-14T17:56:37Z)
Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD. Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved. However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.