Related papers: TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search

TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search

URL: http://arxiv.org/abs/2506.12072v2
Date: Fri, 29 Aug 2025 09:25:55 GMT
Title: TrueGL: A Truthful, Reliable, and Unified Engine for Grounded Learning in Full-Stack Search
Authors: Joydeep Chandra, Aleksandr Algazinov, Satyam Kumar Navneet, Rim El Filali, Matt Laing, Andrew Hanna,
Abstract summary: We present TrueGL, a model that makes trustworthy search results more accessible.<n>We evaluate the system using prompt engineering and assigning each statement a continuous reliability score from 0.1 to 1.<n>The model's high accuracy, broad content coverage, and ease of use make trustworthy information more accessible.
Score: 36.07973770472031
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the age of open and free information, a concerning trend of reliance on AI is emerging. However, existing AI tools struggle to evaluate the credibility of information and to justify their assessments. Hence, there is a growing need for systems that can help users evaluate the trustworthiness of online information. Although major search engines incorporate AI features, they often lack clear reliability indicators. We present TrueGL, a model that makes trustworthy search results more accessible. The model is a fine-tuned version of IBM's Granite-1B, trained on the custom dataset and integrated into a search engine with a reliability scoring system. We evaluate the system using prompt engineering and assigning each statement a continuous reliability score from 0.1 to 1, then instructing the model to return a textual explanation alongside the score. Each model's predicted scores are measured against real scores using standard evaluation metrics. TrueGL consistently outperforms other small-scale LLMs and rule-based approaches across all experiments on key evaluation metrics, including MAE, RMSE, and R2. The model's high accuracy, broad content coverage, and ease of use make trustworthy information more accessible and help reduce the spread of false or misleading content online. Our code is publicly available at https://github.com/AlgazinovAleksandr/TrueGL, and our model is publicly released at https://huggingface.co/JoydeepC/trueGL.

Related papers

Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework [0.5999777817331317]
This research proposes an innovative verifiable misinformation detection LLM agent.<n>The agent actively verifies claims through dynamic interaction with diverse web sources.<n>It assesses information source credibility, synthesizes evidence, and provides a complete verifiable reasoning process.
arXiv Detail & Related papers (2025-08-05T05:15:03Z)
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora [9.871701356351542]
Language Models (LMs) continue to advance, improving response quality and coherence.<n>A plethora of evaluation benchmarks have been constructed to assess model quality, response appropriateness, and reasoning capabilities.<n>We propose a methodology for automating the construction of fact-based synthetic data model evaluations grounded in document populations.
arXiv Detail & Related papers (2025-05-13T18:50:03Z)
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools [54.63478102768333]
Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions.<n>We propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools.
arXiv Detail & Related papers (2025-04-28T18:06:38Z)
DataSciBench: An LLM Agent Benchmark for Data Science [33.3811507234528]
DataSciBench is a benchmark for evaluating Large Language Model (LLM) capabilities in data science.<n>We develop a semi-automated pipeline for generating ground truth (GT) and validating evaluation metrics.<n>We propose an innovative Task - Function - Code framework to assess each code execution outcome.
arXiv Detail & Related papers (2025-02-19T17:31:51Z)
Bridging the Data Gap in AI Reliability Research and Establishing DR-AIR, a Comprehensive Data Repository for AI Reliability [4.769924694900377]
A major challenge in AI reliability research, particularly for those in academia, is the lack of readily available AI reliability data.<n>This paper conducts a comprehensive review of available AI reliability data and establishes DR-AIR: a data repository for AI reliability data.
arXiv Detail & Related papers (2025-02-17T23:50:36Z)
Privacy-Preserving Verifiable Neural Network Inference Service [4.131956503199438]
We develop a privacy-preserving and verifiable CNN inference scheme that preserves privacy for client data samples. vPIN achieves high efficiency in terms of proof size, while providing client data privacy guarantees and provable verifiability.
arXiv Detail & Related papers (2024-11-12T01:09:52Z)
MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions. We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users. Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z)
Fostering Trust and Quantifying Value of AI and ML [0.0]
Much has been discussed about trusting AI and ML inferences, but little has been done to define what that means. producing ever more trustworthy machine learning inferences is a path to increase the value of products.
arXiv Detail & Related papers (2024-07-08T13:25:28Z)
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering [26.34649731975005]
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for question answering (QA) While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics unreliable for accurately quantifying model performance. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness) and 2) whether they produce a response based on the provided knowledge (faithfulness)
arXiv Detail & Related papers (2023-07-31T17:41:00Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [49.15931834209624]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.<n>We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.<n>By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z)
KGTrust: Evaluating Trustworthiness of SIoT via Knowledge Enhanced Graph Neural Networks [63.531790269009704]
Social Internet of Things (SIoT) is a promising and emerging paradigm that injects the notion of social networking into smart objects (i.e., things) Due to the risks and uncertainty, a crucial and urgent problem to be settled is establishing reliable relationships within SIoT, that is, trust evaluation. We propose a novel knowledge-enhanced graph neural network (KGTrust) for better trust evaluation in SIoT.
arXiv Detail & Related papers (2023-02-22T14:24:45Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System [16.542137414609602]
We propose a complete and novel architecture to measure confidence of current deep learning models in document information extraction task. Our architecture consists of a Multi-modal Conformal Predictor and a Variational Cluster-oriented Anomaly Detector. We evaluate our architecture on real-wold datasets, not only outperforming competing confidence estimators by a huge margin but also demonstrating generalization ability to out-of-distribution data.
arXiv Detail & Related papers (2022-06-01T09:57:34Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management [61.88858330222619]
We present an approach for predicting trust links between peers in social media. We propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis. Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset.
arXiv Detail & Related papers (2021-11-11T19:40:51Z)
FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference [61.068947982746224]
FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media. The architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media. Our results show state-of-the-art performance on the individual benchmarks, as well as producing useful analysis of the evolution over time of 61 different hoaxes.
arXiv Detail & Related papers (2021-10-27T15:44:54Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.