What BERT Sees: Cross-Modal Transfer for Visual Question Generation
- URL: http://arxiv.org/abs/2002.10832v3
- Date: Wed, 16 Dec 2020 15:48:35 GMT
- Title: What BERT Sees: Cross-Modal Transfer for Visual Question Generation
- Authors: Thomas Scialom, Patrick Bordes, Paul-Alexis Dray, Jacopo Staiano,
Patrick Gallinari
- Abstract summary: We study the visual capabilities of BERT out-of-the-box, by avoiding pre-training made on supplementary data.
We introduce BERT-gen, a BERT-based architecture for text generation, able to leverage on either mono- or multi- modal representations.
- Score: 21.640299110619384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models have recently contributed to significant advances
in NLP tasks. Recently, multi-modal versions of BERT have been developed, using
heavy pre-training relying on vast corpora of aligned textual and image data,
primarily applied to classification tasks such as VQA. In this paper, we are
interested in evaluating the visual capabilities of BERT out-of-the-box, by
avoiding pre-training made on supplementary data. We choose to study Visual
Question Generation, a task of great interest for grounded dialog, that enables
to study the impact of each modality (as input can be visual and/or textual).
Moreover, the generation aspect of the task requires an adaptation since BERT
is primarily designed as an encoder. We introduce BERT-gen, a BERT-based
architecture for text generation, able to leverage on either mono- or multi-
modal representations. The results reported under different configurations
indicate an innate capacity for BERT-gen to adapt to multi-modal data and text
generation, even with few data available, avoiding expensive pre-training. The
proposed model obtains substantial improvements over the state-of-the-art on
two established VQG datasets.
Related papers
- Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs.
For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction.
A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone [170.85076677740292]
We present FIBER (Fusion-In-the-Backbone-basedER), a new model architecture for vision-language (VL) pre-training.
Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model.
We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection.
arXiv Detail & Related papers (2022-06-15T16:41:29Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [0.0]
BERT has revolutionized the NLP field by enabling transfer learning with large language models.
This article studies how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models.
arXiv Detail & Related papers (2022-01-10T15:05:05Z) - Transferring BERT-like Transformers' Knowledge for Authorship
Verification [8.443350618722562]
We study the effectiveness of several BERT-like transformers for the task of authorship verification.
We provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors.
We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.
arXiv Detail & Related papers (2021-12-09T18:57:29Z) - BERTGEN: Multi-task Generation through BERT [30.905286823599976]
We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models.
With a comprehensive set of evaluations, we show that BERTGEN outperforms many strong baselines across the tasks explored.
We also show BERTGEN's ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts.
arXiv Detail & Related papers (2021-06-07T10:17:45Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Table Search Using a Deep Contextualized Language Model [20.041167804194707]
In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval.
We propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT.
arXiv Detail & Related papers (2020-05-19T04:18:04Z) - Enriched Pre-trained Transformers for Joint Slot Filling and Intent
Detection [22.883725214057286]
In this paper, we propose a novel architecture for learning intent-based language models.
We propose an intent pooling attention mechanism, and we reinforce the slot filling task by fusing intent distributions, word features, and token representations.
The experimental results on standard datasets show that our model outperforms both the current non-BERT state of the art as well as some stronger BERT-based baselines.
arXiv Detail & Related papers (2020-04-30T15:00:21Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.