Enhancing BERT-Based Visual Question Answering through Keyword-Driven
Sentence Selection
- URL: http://arxiv.org/abs/2310.09432v1
- Date: Fri, 13 Oct 2023 22:43:55 GMT
- Title: Enhancing BERT-Based Visual Question Answering through Keyword-Driven
Sentence Selection
- Authors: Davide Napolitano and Lorenzo Vaiani and Luca Cagliero
- Abstract summary: Document-based Visual Question Answering competition addresses the automatic detection of parent-child relationships in documents.
This paper describes the PoliTo's approach to addressing this task, in particular, our best solution explores a text-only approach.
Thanks to the effectiveness of this approach, we are able to achieve high performance compared to baselines.
- Score: 8.586466827855016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Document-based Visual Question Answering competition addresses the
automatic detection of parent-child relationships between elements in
multi-page documents. The goal is to identify the document elements that answer
a specific question posed in natural language. This paper describes the
PoliTo's approach to addressing this task, in particular, our best solution
explores a text-only approach, leveraging an ad hoc sampling strategy.
Specifically, our approach leverages the Masked Language Modeling technique to
fine-tune a BERT model, focusing on sentences containing sensitive keywords
that also occur in the questions, such as references to tables or images.
Thanks to the effectiveness of this approach, we are able to achieve high
performance compared to baselines, demonstrating how our solution contributes
positively to this task.
Related papers
- STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM [59.08493154172207]
We propose a unified framework to streamline the semantic tokenization and generative recommendation process.
We formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task.
All these tasks are framed in a generative manner and trained using a single large language model (LLM) backbone.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Generative Retrieval with Preference Optimization for E-commerce Search [16.78829577915103]
We develop an innovative framework for E-commerce search, called generative retrieval with preference optimization.
We employ multi-span identifiers to represent raw item titles and transform the task of generating titles from queries into the task of generating multi-span identifiers from queries.
Our experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
arXiv Detail & Related papers (2024-07-29T09:31:19Z) - Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach [33.231639257323536]
In this paper, we address the issue of dialogue-form context query within the interactive text-to-image retrieval task.
By reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data.
We construct the LLM questioner to generate non-redundant questions about the attributes of the target image.
arXiv Detail & Related papers (2024-06-05T16:09:01Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Answer Candidate Type Selection: Text-to-Text Language Model for Closed
Book Question Answering Meets Knowledge Graphs [62.20354845651949]
We present a novel approach which works on top of the pre-trained Text-to-Text QA system to address this issue.
Our simple yet effective method performs filtering and re-ranking of generated candidates based on their types derived from Wikidata "instance_of" property.
arXiv Detail & Related papers (2023-10-10T20:49:43Z) - Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information.
We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z) - Enhanced Knowledge Selection for Grounded Dialogues via Document
Semantic Graphs [123.50636090341236]
We propose to automatically convert background knowledge documents into document semantic graphs.
Our document semantic graphs preserve sentence-level information through the use of sentence nodes and provide concept connections between sentences.
Our experiments show that our semantic graph-based knowledge selection improves over sentence selection baselines for both the knowledge selection task and the end-to-end response generation task on HollE.
arXiv Detail & Related papers (2022-06-15T04:51:32Z) - MDERank: A Masked Document Embedding Rank Approach for Unsupervised
Keyphrase Extraction [41.941098507759015]
Keyphrases are phrases in a document providing a concise summary of core content, helping readers to understand what the article is talking about in a minute.
We propose a novel unsupervised keyword extraction method by leveraging the BERT-based model to select and rank candidate keyphrases with a MASK strategy.
arXiv Detail & Related papers (2021-10-13T11:29:17Z) - Asking questions on handwritten document collections [35.85762649504866]
This work addresses the problem of Question Answering (QA) on handwritten document collections.
Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies.
We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
arXiv Detail & Related papers (2021-10-02T02:40:40Z) - Keyword-Attentive Deep Semantic Matching [1.8416014644193064]
We propose a keyword-attentive approach to improve deep semantic matching.
We first leverage domain tags from a large corpus to generate a domain-enhanced keyword dictionary.
During model training, we propose a new negative sampling approach based on keyword coverage between the input pair.
arXiv Detail & Related papers (2020-03-11T10:18:32Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.