GRATR: Zero-Shot Evidence Graph Retrieval-Augmented Trustworthiness Reasoning
- URL: http://arxiv.org/abs/2408.12333v3
- Date: Mon, 27 Jan 2025 09:18:07 GMT
- Title: GRATR: Zero-Shot Evidence Graph Retrieval-Augmented Trustworthiness Reasoning
- Authors: Ying Zhu, Shengchang Li, Ziqian Kong, Qiang Yang, Peilan Xu,
- Abstract summary: Trustworthiness reasoning aims to enable agents in multiplayer games with incomplete information to identify potential allies and adversaries.
We introduce the graph retrieval-augmented trustworthiness reasoning (GRATR) framework, which retrieves observable evidence from the game environment.
- Score: 7.3795957796342195
- License:
- Abstract: Trustworthiness reasoning aims to enable agents in multiplayer games with incomplete information to identify potential allies and adversaries, thereby enhancing decision-making. In this paper, we introduce the graph retrieval-augmented trustworthiness reasoning (GRATR) framework, which retrieves observable evidence from the game environment to inform decision-making by large language models (LLMs) without requiring additional training, making it a zero-shot approach. Within the GRATR framework, agents first observe the actions of other players and evaluate the resulting shifts in inter-player trust, constructing a corresponding trustworthiness graph. During decision-making, the agent performs multi-hop retrieval to evaluate trustworthiness toward a specific target, where evidence chains are retrieved from multiple trusted sources to form a comprehensive assessment. Experiments in the multiplayer game \emph{Werewolf} demonstrate that GRATR outperforms the alternatives, improving reasoning accuracy by 50.5\% and reducing hallucination by 30.6\% compared to the baseline method. Additionally, when tested on a dataset of Twitter tweets during the U.S. election period, GRATR surpasses the baseline method by 10.4\% in accuracy, highlighting its potential in real-world applications such as intent analysis.
Related papers
- Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media [12.479554210753664]
This study adopts a generative approach, where stance predictions include explicit, interpretable rationales.
We find that incorporating reasoning into stance detection enables the smaller model (FlanT5) to outperform GPT-3.5's zero-shot performance.
arXiv Detail & Related papers (2024-12-13T16:34:39Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.
We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.
We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Personalized multi-faceted trust modeling to determine trust links in
social media and its potential for misinformation management [61.88858330222619]
We present an approach for predicting trust links between peers in social media.
We propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis.
Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset.
arXiv Detail & Related papers (2021-11-11T19:40:51Z) - Learning to Give Checkable Answers with Prover-Verifier Games [23.93694563816463]
We introduce Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner.
We analyze variants of the framework, including simultaneous and sequential games, and narrow the space down to a subset of games which provably have the desired equilibria.
We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.
arXiv Detail & Related papers (2021-08-27T02:56:06Z) - SRLF: A Stance-aware Reinforcement Learning Framework for Content-based
Rumor Detection on Social Media [15.985224010346593]
Early content-based methods focused on finding clues from text and user profiles for rumor detection.
Recent studies combine the stances of users' comments with news content to capture the difference between true and false rumors.
We propose a novel Stance-aware Reinforcement Learning Framework (SRLF) to select high-quality labeled stance data for model training and rumor detection.
arXiv Detail & Related papers (2021-05-10T03:58:34Z) - BaFFLe: Backdoor detection via Feedback-based Federated Learning [3.6895394817068357]
We propose Backdoor detection via Feedback-based Federated Learning (BAFFLE)
We show that BAFFLE reliably detects state-of-the-art backdoor attacks with a detection accuracy of 100% and a false-positive rate below 5%.
arXiv Detail & Related papers (2020-11-04T07:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.