Logically at the Factify 2022: Multimodal Fact Verification
- URL: http://arxiv.org/abs/2112.09253v1
- Date: Thu, 16 Dec 2021 23:34:07 GMT
- Title: Logically at the Factify 2022: Multimodal Fact Verification
- Authors: Jie Gao, Hella-Franziska Hoffmann, Stylianos Oikonomou, David
Kiskovski, Anil Bandhakavi
- Abstract summary: This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022.
Two baseline approaches are proposed and explored including an ensemble model and a multi-modal attention network.
Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set.
- Score: 2.8914815569249823
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper describes our participant system for the multi-modal fact
verification (Factify) challenge at AAAI 2022. Despite the recent advance in
text based verification techniques and large pre-trained multimodal models
cross vision and language, very limited work has been done in applying
multimodal techniques to automate fact checking process, particularly
considering the increasing prevalence of claims and fake news about images and
videos on social media. In our work, the challenge is treated as multimodal
entailment task and framed as multi-class classification. Two baseline
approaches are proposed and explored including an ensemble model (combining two
uni-modal models) and a multi-modal attention network (modeling the interaction
between image and text pair from claim and evidence document). We conduct
several experiments investigating and benchmarking different SoTA pre-trained
transformers and vision models in this work. Our best model is ranked first in
leaderboard which obtains a weighted average F-measure of 0.77 on both
validation and test set. Exploratory analysis of dataset is also carried out on
the Factify data set and uncovers salient patterns and issues (e.g., word
overlapping, visual entailment correlation, source bias) that motivates our
hypothesis. Finally, we highlight challenges of the task and multimodal dataset
for future research.
Related papers
- Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines [63.427721165404634]
This paper investigates an intriguing task of Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG)
This task requires foundation models to browse multi-modal web pages, with mixed text and images, and generate multi-modal responses for solving user queries.
We construct a benchmark for M$2$RAG task, equipped with a suite of text-modal metrics and multi-modal metrics to analyze the capabilities of existing foundation models.
arXiv Detail & Related papers (2024-11-25T13:20:19Z) - Piecing It All Together: Verifying Multi-Hop Multimodal Claims [39.68850054331197]
We introduce a new task: multi-hop multimodal claim verification.
This task challenges models to reason over multiple pieces of evidence from diverse sources, including text, images, and tables.
We construct MMCV, a large-scale dataset comprising 16k multi-hop claims paired with multimodal evidence, with additional input from human feedback.
arXiv Detail & Related papers (2024-11-14T16:01:33Z) - MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking [0.283600654802951]
We present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal datasets.
We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths.
Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset.
arXiv Detail & Related papers (2024-07-18T01:33:20Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation
Models with Feature Representations for Multi-Modal Fact Verification [5.552606716659022]
Multi-modal fact verification has become an important but challenging issue on social media.
In this paper, we propose the Pre-CoFactv2 framework for modeling fine-grained text and input embeddings with lightening parameters.
We show that Pre-CoFactv2 outperforms Pre-CoFact by a large margin and achieved new state-of-the-art results at the Factify challenge at AAAI 2023.
arXiv Detail & Related papers (2023-02-12T18:08:54Z) - Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
Dilutions [27.983902791798965]
We develop a model that generates dilution text that maintains relevance and topical coherence with the image and existing text.
We find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model.
Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations.
arXiv Detail & Related papers (2022-11-04T17:58:02Z) - Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years.
We comprehensively contextualize the advance of the recent multimodal image synthesis and editing.
We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.