Multimodal Detection of Fake Reviews using BERT and ResNet-50
- URL: http://arxiv.org/abs/2511.00020v1
- Date: Fri, 24 Oct 2025 01:24:53 GMT
- Title: Multimodal Detection of Fake Reviews using BERT and ResNet-50
- Authors: Suhasnadh Reddy Veluru, Sai Teja Erukude, Viswa Chaitanya Marella,
- Abstract summary: A robust multimodal fake review detection framework is proposed, integrating textual features encoded with BERT and visual features extracted using ResNet-50.<n> Experimental results indicate that the multimodal model outperforms unimodal baselines, achieving an F1-score of 0.934 on the test set.<n>This study demonstrates the critical role of multimodal learning in safeguarding digital trust and offers a scalable solution for content moderation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the current digital commerce landscape, user-generated reviews play a critical role in shaping consumer behavior, product reputation, and platform credibility. However, the proliferation of fake or misleading reviews often generated by bots, paid agents, or AI models poses a significant threat to trust and transparency within review ecosystems. Existing detection models primarily rely on unimodal, typically textual, data and therefore fail to capture semantic inconsistencies across different modalities. To address this gap, a robust multimodal fake review detection framework is proposed, integrating textual features encoded with BERT and visual features extracted using ResNet-50. These representations are fused through a classification head to jointly predict review authenticity. To support this approach, a curated dataset comprising 21,142 user-uploaded images across food delivery, hospitality, and e-commerce domains was utilized. Experimental results indicate that the multimodal model outperforms unimodal baselines, achieving an F1-score of 0.934 on the test set. Additionally, the confusion matrix and qualitative analysis highlight the model's ability to detect subtle inconsistencies, such as exaggerated textual praise paired with unrelated or low-quality images, commonly found in deceptive content. This study demonstrates the critical role of multimodal learning in safeguarding digital trust and offers a scalable solution for content moderation across various online platforms.
Related papers
- Talk, Snap, Complain: Validation-Aware Multimodal Expert Framework for Fine-Grained Customer Grievances [14.30884038757821]
Existing approaches to complaint analysis largely rely on unimodal, short-form content such as tweets or product reviews.<n>We introduce VALOR, a Validation-Aware Learner with Expert Routing, tailored for this multimodal setting.<n>We evaluate VALOR on a curated multimodal complaint dataset annotated with fine-grained aspect severity and labels.
arXiv Detail & Related papers (2025-11-18T17:29:28Z) - Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z) - EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications [24.832537917472894]
EVADE is the first expert-curated, Chinese, multimodal benchmark designed to evaluate foundation models on evasive content detection in e-commerce.<n>The dataset contains 2,833 annotated text samples and 13,961 images spanning six demanding product categories.
arXiv Detail & Related papers (2025-05-23T09:18:01Z) - Consistency-aware Fake Videos Detection on Short Video Platforms [4.291448222735821]
This paper focuses on detecting fake news on the short video platforms.<n>Existing approaches typically combine raw video data with metadata inputs before applying a classification layer.<n>Motivated by this insight, we propose a novel detection paradigm that explicitly identifies and leverages cross-modal contradictions.
arXiv Detail & Related papers (2025-04-30T10:26:04Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant
Reviews and Images on Social Media [57.70351255180495]
AiGen-FoodReview is a dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated.
We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA.
The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.
arXiv Detail & Related papers (2024-01-16T20:57:36Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.