Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content
- URL: http://arxiv.org/abs/2504.10878v1
- Date: Tue, 15 Apr 2025 05:11:40 GMT
- Title: Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content
- Authors: Yilang Peng, Sijia Qian, Yingdan Lu, Cuihua Shen,
- Abstract summary: We introduce a Large Language Model (LLM)-informed feature discovery framework to evaluate content credibility and explain its reasoning.<n>We extract and quantify interpretable features using targeted prompts and integrate them into machine learning models to improve credibility predictions.<n>Our method outperformed zero-shot GPT-based predictions by 13 percent in R2, and revealed key features like information concreteness and image format.
- Score: 0.24999074238880484
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to the diversity and richness of visual features. We introduce a Large Language Model (LLM)-informed feature discovery framework that leverages multimodal LLMs, such as GPT-4o, to evaluate content credibility and explain its reasoning. We extract and quantify interpretable features using targeted prompts and integrate them into machine learning models to improve credibility predictions. We tested this approach on 4,191 visual social media posts across eight topics in science, health, and politics, using credibility ratings from 5,355 crowdsourced workers. Our method outperformed zero-shot GPT-based predictions by 13 percent in R2, and revealed key features like information concreteness and image format. We discuss the implications for misinformation mitigation, visual credibility, and the role of LLMs in social science.
Related papers
- VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories.
These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives.
Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z) - VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues [34.95077625513563]
VLM$2$-Bench is a benchmark designed to assess whether vision-language models can Visually Link Matching cues.<n>We identify critical challenges in models' ability to link visual cues, highlighting a significant performance gap where even GPT-4o lags 34.80% behind humans.
arXiv Detail & Related papers (2025-02-17T17:57:50Z) - Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach [33.20992355312175]
Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable progress in visual understanding.
This paper aims to address this question with systematic investigation across 4 model families and 4 model scales.
Our analysis reveals a strong correlation between the behavior of these attention heads, the distribution of attention weights, and their concentration on visual tokens within the input.
arXiv Detail & Related papers (2024-12-24T02:31:24Z) - VHELM: A Holistic Evaluation of Vision Language Models [75.88987277686914]
We present the Holistic Evaluation of Vision Language Models (VHELM)
VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety.
Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast.
arXiv Detail & Related papers (2024-10-09T17:46:34Z) - Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency [3.161954199291541]
This research study comprehensively evaluates the language, vision, speech, and multimodal capabilities of GPT-4o.
GPT-4o demonstrates high accuracy and efficiency across multiple domains in language and reasoning capabilities.
The model shows variability and faces limitations in handling complex and ambiguous inputs.
arXiv Detail & Related papers (2024-06-19T19:00:21Z) - Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines [2.0330684186105805]
This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines.
Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy.
arXiv Detail & Related papers (2024-05-06T04:06:45Z) - Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models [57.95366341738857]
In-depth analyses show that instruction-tuned LVLMs exhibit modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept.<n>We propose a multiple attribute-centric evaluation benchmark, Finer, to evaluate LVLMs' fine-grained visual comprehension ability and provide significantly improved explainability.
arXiv Detail & Related papers (2024-02-26T05:43:51Z) - GPT-4V(ision) as A Social Media Analysis Engine [77.23394183063238]
This paper explores GPT-4V's capabilities for social multimedia analysis.
We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection.
GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge.
arXiv Detail & Related papers (2023-11-13T18:36:50Z) - A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering [53.70661720114377]
multimodal large models (MLMs) have significantly advanced the field of visual understanding, offering remarkable capabilities in realm of visual question answering (VQA)
Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate deep comprehension of the visual information in conjunction with a vast repository of learned knowledge.
To uncover such capabilities, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing
arXiv Detail & Related papers (2023-11-13T18:22:32Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - MathVista: Evaluating Mathematical Reasoning of Foundation Models in
Visual Contexts [170.01089233942594]
MathVista is a benchmark designed to combine challenges from diverse mathematical and visual tasks.
The best-performing GPT-4V model achieves an overall accuracy of 49.9%, substantially outperforming Bard, the second-best performer, by 15.1%.
GPT-4V still falls short of human performance by 10.4%, as it often struggles to understand complex figures and perform rigorous reasoning.
arXiv Detail & Related papers (2023-10-03T17:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.