Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024
- URL: http://arxiv.org/abs/2407.08495v2
- Date: Fri, 4 Oct 2024 09:25:39 GMT
- Title: Investigating LLMs as Voting Assistants via Contextual Augmentation: A Case Study on the European Parliament Elections 2024
- Authors: Ilias Chalkidis,
- Abstract summary: In light of the recent 2024 European Parliament elections, we are investigating if LLMs can be used as Voting Advice Applications (VAAs)
We evaluate MISTRAL and MIXTRAL models and evaluate their accuracy in predicting the stance of political parties based on the latest "EU and I" voting assistance questionnaire.
We find that MIXTRAL is highly accurate with an 82% accuracy on average with a significant performance disparity across different political groups.
- Score: 22.471701390730185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In light of the recent 2024 European Parliament elections, we are investigating if LLMs can be used as Voting Advice Applications (VAAs). We audit MISTRAL and MIXTRAL models and evaluate their accuracy in predicting the stance of political parties based on the latest "EU and I" voting assistance questionnaire. Furthermore, we explore alternatives to improve models' performance by augmenting the input context via Retrieval-Augmented Generation (RAG) relying on web search, and Self-Reflection using staged conversations that aim to re-collect relevant content from the model's internal memory. We find that MIXTRAL is highly accurate with an 82% accuracy on average with a significant performance disparity across different political groups (50-95%). Augmenting the input context with expert-curated information can lead to a significant boost of approx. 9%, which remains an open challenge for automated RAG approaches, even considering curated content.
Related papers
- MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.
Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.
Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z) - Scaling Crowdsourced Election Monitoring: Construction and Evaluation of Classification Models for Multilingual and Cross-Domain Classification Settings [0.0]
We propose a two-step classification approach of first identifying informative reports and then categorising them into distinct information types.
We conduct classification experiments using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as SBERT.
Our results show promising potential for model transfer across electoral domains, with F1-Scores of 59% in zero-shot and 63% in few-shot settings.
arXiv Detail & Related papers (2025-03-05T15:17:18Z) - Judging the Judges: A Collection of LLM-Generated Relevance Judgements [37.103230004631996]
This paper benchmarks and reports on the results of a large-scale automatic relevance judgment evaluation, the LLMJudge challenge at SIGIR 2024.
We release and benchmark 42 LLM-generated labels of the TREC 2023 Deep Learning track relevance judgments produced by eight international teams.
arXiv Detail & Related papers (2025-02-19T17:40:32Z) - A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look [52.114284476700874]
This paper reports on the results of a large-scale evaluation (the TREC 2024 RAG Track) where four different relevance assessment approaches were deployed.
We find that automatically generated UMBRELA judgments can replace fully manual judgments to accurately capture run-level effectiveness.
Surprisingly, we find that LLM assistance does not appear to increase correlation with fully manual assessments, suggesting that costs associated with human-in-the-loop processes do not bring obvious tangible benefits.
arXiv Detail & Related papers (2024-11-13T01:12:35Z) - Leveraging AI and Sentiment Analysis for Forecasting Election Outcomes in Mauritius [0.0]
This study explores the use of AI-driven sentiment analysis as a novel tool for forecasting election outcomes, focusing on Mauritius' 2024 elections.
We analyze media sentiment toward two main political parties L'Alliance Lepep and L'Alliance Du Changement by classifying news articles from prominent Mauritian media outlets as positive, negative, or neutral.
Findings indicate that positive media sentiment strongly correlates with projected electoral gains, underscoring the role of media in shaping public perception.
arXiv Detail & Related papers (2024-10-28T09:21:15Z) - ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents [70.17229548653852]
We introduce ElectionSim, an innovative election simulation framework based on large language models.
We present a million-level voter pool sampled from social media platforms to support accurate individual simulation.
We also introduce PPE, a poll-based presidential election benchmark to assess the performance of our framework under the U.S. presidential election scenario.
arXiv Detail & Related papers (2024-10-28T05:25:50Z) - Will Trump Win in 2024? Predicting the US Presidential Election via Multi-step Reasoning with Large Language Models [12.939107088730513]
Election prediction poses unique challenges, such as limited voter-level data, rapidly changing political landscapes, and the need to model complex human behavior.
We introduce a multi-step reasoning framework designed for political analysis.
Our approach is validated on real-world data from the American National Election Studies (ANES) 2016 and 2020.
We apply our framework to predict the outcome of the 2024 U.S. presidential election in advance.
arXiv Detail & Related papers (2024-10-21T06:18:53Z) - Large language models can consistently generate high-quality content for election disinformation operations [2.98293101034582]
Large language models have raised concerns about their potential use in generating compelling election disinformation at scale.
This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation.
arXiv Detail & Related papers (2024-08-13T08:45:34Z) - GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy [20.06753067241866]
We evaluate and compare the alignment of six LLMs by OpenAI, Anthropic, and Cohere with German party positions.
We conduct our prompt experiment for which we use the benchmark and sociodemographic data of leading German parliamentarians.
arXiv Detail & Related papers (2024-07-25T13:04:25Z) - Cutting Through the Noise: Boosting LLM Performance on Math Word Problems [52.99006895757801]
Large Language Models excel at solving math word problems, but struggle with real-world problems containing irrelevant information.
We propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables.
Fine-tuning on adversarial training instances improves performance on adversarial MWPs by 8%.
arXiv Detail & Related papers (2024-05-30T18:07:13Z) - Large Language Models (LLMs) as Agents for Augmented Democracy [6.491009626125319]
We explore an augmented democracy system built on off-the-shelf LLMs fine-tuned to augment data on citizen's preferences.
We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants.
arXiv Detail & Related papers (2024-05-06T13:23:57Z) - Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs [56.526095828316386]
We propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of large language models (LLMs)
We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods.
arXiv Detail & Related papers (2023-10-18T03:34:59Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - Benchmarking Large Language Models in Retrieval-Augmented Generation [53.504471079548]
We systematically investigate the impact of Retrieval-Augmented Generation on large language models.
We analyze the performance of different large language models in 4 fundamental abilities required for RAG.
We establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese.
arXiv Detail & Related papers (2023-09-04T08:28:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.