INO at Factify 2: Structure Coherence based Multi-Modal Fact
Verification
- URL: http://arxiv.org/abs/2303.01510v1
- Date: Thu, 2 Mar 2023 11:18:56 GMT
- Title: INO at Factify 2: Structure Coherence based Multi-Modal Fact
Verification
- Authors: Yinuo Zhang, Zhulin Tao, Xi Wang, Tongyue Wang
- Abstract summary: We propose a structure coherence-based multi-modal fact verification scheme to classify fake news.
Our weighted average F1 score has reached 0.8079, achieving 2nd place in FACTIFY2.
- Score: 3.5408382050384883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our approach to the multi-modal fact verification
(FACTIFY) challenge at AAAI2023. In recent years, with the widespread use of
social media, fake news can spread rapidly and negatively impact social
security. Automatic claim verification becomes more and more crucial to combat
fake news. In fact verification involving multiple modal data, there should be
a structural coherence between claim and document. Therefore, we proposed a
structure coherence-based multi-modal fact verification scheme to classify fake
news. Our structure coherence includes the following four aspects: sentence
length, vocabulary similarity, semantic similarity, and image similarity.
Specifically, CLIP and Sentence BERT are combined to extract text features, and
ResNet50 is used to extract image features. In addition, we also extract the
length of the text as well as the lexical similarity. Then the features were
concatenated and passed through the random forest classifier. Finally, our
weighted average F1 score has reached 0.8079, achieving 2nd place in FACTIFY2.
Related papers
- TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document [60.01330653769726]
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.
By adopting Shifted Window Attention with zero-initialization, we achieve cross-window connectivity at higher input resolutions.
By expanding its capabilities to encompass text spotting and grounding, and incorporating positional information into responses, we enhance interpretability.
arXiv Detail & Related papers (2024-03-07T13:16:24Z) - Unified Coarse-to-Fine Alignment for Video-Text Retrieval [71.85966033484597]
We propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA.
Our model captures the cross-modal similarity information at different granularity levels.
We apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them.
arXiv Detail & Related papers (2023-09-18T19:04:37Z) - Findings of Factify 2: Multimodal Fake News Detection [36.34201719103715]
We present the outcome of the Factify 2 shared task, which provides a multi-modal fact verification and satire news dataset.
The data calls for a comparison based approach to the task by pairing social media claims with supporting documents, with both text and image, divided into 5 classes based on multi-modal relations.
The highest F1 score averaged for all five classes was 81.82%.
arXiv Detail & Related papers (2023-07-19T22:14:49Z) - Factify 2: A Multimodal Fake News and Satire News Dataset [36.34201719103715]
We provide a multi-modal fact-checking dataset called FACTIFY 2, improving Factify 1 by using new data sources and adding satire articles.
Similar to FACTIFY 1.0, we have three broad categories - support, no-evidence, and refute, with sub-categories based on the entailment of visual and textual data.
We also provide a BERT and Vison Transformer based baseline, which achieves 65% F1 score in the test set.
arXiv Detail & Related papers (2023-04-08T03:14:19Z) - IMCI: Integrate Multi-view Contextual Information for Fact Extraction
and Verification [19.764122035213067]
We propose to integrate multi-view contextual information (IMCI) for fact extraction and verification.
Our experimental results on FEVER 1.0 shared task show that our IMCI framework makes great progress on both fact extraction and verification.
arXiv Detail & Related papers (2022-08-30T05:57:34Z) - Multimodal Fake News Detection with Adaptive Unimodal Representation
Aggregation [28.564442206829625]
AURA is a multimodal fake news detection network with adaptive unimodal representation aggregation.
We perform coarse-level fake news detection and cross-modal cosistency learning according to the unimodal and multimodal representations.
Experiments on Weibo and Gossipcop prove that AURA can successfully beat several state-of-the-art FND schemes.
arXiv Detail & Related papers (2022-06-12T14:06:55Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z) - FiLMing Multimodal Sarcasm Detection with Attention [0.7340017786387767]
Sarcasm detection identifies natural language expressions whose intended meaning is different from what is implied by its surface meaning.
We propose a novel architecture that uses the RoBERTa model with a co-attention layer on top to incorporate context incongruity between input text and image attributes.
Our results demonstrate that our proposed model outperforms the existing state-of-the-art method by 6.14% F1 score on the public Twitter multimodal detection dataset.
arXiv Detail & Related papers (2021-08-09T06:33:29Z) - Enriching Transformers with Structured Tensor-Product Representations
for Abstractive Summarization [131.23966358405767]
We adapt TP-TRANSFORMER with the explicitly compositional Product Representation (TPR) for the task of abstractive summarization.
Key feature of our model is a structural bias that we introduce by encoding two separate representations for each token.
We show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets.
arXiv Detail & Related papers (2021-06-02T17:32:33Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.