Related papers: Global-Liar: Factuality of LLMs over Time and Geographic Regions

Global-Liar: Factuality of LLMs over Time and Geographic Regions

URL: http://arxiv.org/abs/2401.17839v1
Date: Wed, 31 Jan 2024 13:57:24 GMT
Title: Global-Liar: Factuality of LLMs over Time and Geographic Regions
Authors: Shujaat Mirza, Bruno Coelho, Yuyuan Cui, Christina P\"opper, Damon McCoy
Abstract summary: This study evaluates the factual accuracy, stability, and biases in widely adopted GPT models, including GPT-3.5 and GPT-4. We introduce 'Global-Liar,' a dataset uniquely balanced in terms of geographic and temporal representation.
Score: 3.715487408753612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing reliance on AI-driven solutions, particularly Large Language Models (LLMs) like the GPT series, for information retrieval highlights the critical need for their factuality and fairness, especially amidst the rampant spread of misinformation and disinformation online. Our study evaluates the factual accuracy, stability, and biases in widely adopted GPT models, including GPT-3.5 and GPT-4, contributing to reliability and integrity of AI-mediated information dissemination. We introduce 'Global-Liar,' a dataset uniquely balanced in terms of geographic and temporal representation, facilitating a more nuanced evaluation of LLM biases. Our analysis reveals that newer iterations of GPT models do not always equate to improved performance. Notably, the GPT-4 version from March demonstrates higher factual accuracy than its subsequent June release. Furthermore, a concerning bias is observed, privileging statements from the Global North over the Global South, thus potentially exacerbating existing informational inequities. Regions such as Africa and the Middle East are at a disadvantage, with much lower factual accuracy. The performance fluctuations over time suggest that model updates may not consistently benefit all regions equally. Our study also offers insights into the impact of various LLM configuration settings, such as binary decision forcing, model re-runs and temperature, on model's factuality. Models constrained to binary (true/false) choices exhibit reduced factuality compared to those allowing an 'unclear' option. Single inference at a low temperature setting matches the reliability of majority voting across various configurations. The insights gained highlight the need for culturally diverse and geographically inclusive model training and evaluation. This approach is key to achieving global equity in technology, distributing AI benefits fairly worldwide.

Related papers

Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models [7.604241782666465]
We evaluate the factual accuracy of open and private models across a diverse set of regions and scenarios. Our findings reveal that regardless of the scenario and LLM used, statements from the Global North perform substantially better than those from the Global South.
arXiv Detail & Related papers (2025-03-28T21:07:43Z)
Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas [23.868986217962373]
Small models trained only with locally-collected data outperform published global TCH maps. We identify specific points of conflict and synergy between local and global modeling paradigms.
arXiv Detail & Related papers (2024-11-21T17:53:27Z)
Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases. FAST surpasses state-of-the-art baselines with superior debiasing performance. This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence. Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework. We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z)
FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning [10.641875933652647]
Federated Learning (FL) is a novel approach that allows for collaborative machine learning. FL faces challenges due to non-uniformly distributed (non-iid) data across clients. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models.
arXiv Detail & Related papers (2024-04-14T10:23:30Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint. We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG) Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)
Jalisco's multiclass land cover analysis and classification using a novel lightweight convnet with real-world multispectral and relief data [51.715517570634994]
We present our novel lightweight (only 89k parameters) Convolution Neural Network (ConvNet) to make LC classification and analysis. In this work, we combine three real-world open data sources to obtain 13 channels. Our embedded analysis anticipates the limited performance in some classes and gives us the opportunity to group the most similar.
arXiv Detail & Related papers (2022-01-26T14:58:51Z)
Preservation of the Global Knowledge by Not-True Self Knowledge Distillation in Federated Learning [8.474470736998136]
In Federated Learning (FL), a strong global model is collaboratively learned by aggregating the clients' locally trained models. We observe that fitting on biased local distribution shifts the feature on global distribution and results in forgetting of global knowledge. We propose a simple yet effective framework Federated Local Self-Distillation (FedLSD), which utilizes the global knowledge on locally available data.
arXiv Detail & Related papers (2021-06-06T11:51:47Z)
Federated Learning With Quantized Global Model Updates [84.55126371346452]
We study federated learning, which enables mobile devices to utilize their local datasets to train a global model. We introduce a lossy FL (LFL) algorithm, in which both the global model and the local model updates are quantized before being transmitted.
arXiv Detail & Related papers (2020-06-18T16:55:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.