Team Yao at Factify 2022: Utilizing Pre-trained Models and Co-attention
Networks for Multi-Modal Fact Verification
- URL: http://arxiv.org/abs/2201.11664v1
- Date: Wed, 26 Jan 2022 16:04:37 GMT
- Title: Team Yao at Factify 2022: Utilizing Pre-trained Models and Co-attention
Networks for Multi-Modal Fact Verification
- Authors: Wei-Yao Wang, Wen-Chih Peng
- Abstract summary: We propose a framework, Pre-CoFact, composed of two pre-trained models for extracting features from text and images.
We adopt the ensemble method by using different pre-trained models in Pre-CoFact to achieve better performance.
Our model achieved competitive performance without using auxiliary tasks or extra information.
- Score: 7.3724108865167945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, social media has enabled users to get exposed to a myriad of
misinformation and disinformation; thus, misinformation has attracted a great
deal of attention in research fields and as a social issue. To address the
problem, we propose a framework, Pre-CoFact, composed of two pre-trained models
for extracting features from text and images, and multiple co-attention
networks for fusing the same modality but different sources and different
modalities. Besides, we adopt the ensemble method by using different
pre-trained models in Pre-CoFact to achieve better performance. We further
illustrate the effectiveness from the ablation study and examine different
pre-trained models for comparison. Our team, Yao, won the fifth prize
(F1-score: 74.585\%) in the Factify challenge hosted by De-Factify @ AAAI 2022,
which demonstrates that our model achieved competitive performance without
using auxiliary tasks or extra information. The source code of our work is
publicly available at
https://github.com/wywyWang/Multi-Modal-Fact-Verification-2021
Related papers
- UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - Masked Diffusion Models Are Fast Distribution Learners [32.485235866596064]
Diffusion models are commonly trained to learn all fine-grained visual information from scratch.
We show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution.
Then the pre-trained model can be fine-tuned for various generation tasks efficiently.
arXiv Detail & Related papers (2023-06-20T08:02:59Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation
Models with Feature Representations for Multi-Modal Fact Verification [5.552606716659022]
Multi-modal fact verification has become an important but challenging issue on social media.
In this paper, we propose the Pre-CoFactv2 framework for modeling fine-grained text and input embeddings with lightening parameters.
We show that Pre-CoFactv2 outperforms Pre-CoFact by a large margin and achieved new state-of-the-art results at the Factify challenge at AAAI 2023.
arXiv Detail & Related papers (2023-02-12T18:08:54Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Logically at the Factify 2022: Multimodal Fact Verification [2.8914815569249823]
This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022.
Two baseline approaches are proposed and explored including an ensemble model and a multi-modal attention network.
Our best model is ranked first in leaderboard which obtains a weighted average F-measure of 0.77 on both validation and test set.
arXiv Detail & Related papers (2021-12-16T23:34:07Z) - Federated Generative Adversarial Learning [13.543039993168735]
Generative adversarial networks (GANs) have achieved advancement in various real-world applications.
GANs are suffering from data limitation problems in real cases.
We propose a novel generative learning scheme utilizing a federated learning framework.
arXiv Detail & Related papers (2020-05-07T23:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.