One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
- URL: http://arxiv.org/abs/2209.06584v1
- Date: Mon, 12 Sep 2022 19:26:32 GMT
- Title: One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
- Authors: Abhinav Java, Shripad Deshmukh, Milan Aggarwal, Surgan Jandial,
Mausoom Sarkar, Balaji Krishnamurthy
- Abstract summary: We propose MONOMER as a one-shot snippet task to find snippets in target documents.
We conduct experiments showing MONOMER outperforms several baselines from oneshot- template-LM.
We train MONOMER on.
generated data having many visually similar query detection data.
- Score: 12.98328149016239
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Active consumption of digital documents has yielded scope for research in
various applications, including search. Traditionally, searching within a
document has been cast as a text matching problem ignoring the rich layout and
visual cues commonly present in structured documents, forms, etc. To that end,
we ask a mostly unexplored question: "Can we search for other similar snippets
present in a target document page given a single query instance of a document
snippet?". We propose MONOMER to solve this as a one-shot snippet detection
task. MONOMER fuses context from visual, textual, and spatial modalities of
snippets and documents to find query snippet in target documents. We conduct
extensive ablations and experiments showing MONOMER outperforms several
baselines from one-shot object detection (BHRL), template matching, and
document understanding (LayoutLMv3). Due to the scarcity of relevant data for
the task at hand, we train MONOMER on programmatically generated data having
many visually similar query snippets and target document pairs from two
datasets - Flamingo Forms and PubLayNet. We also do a human study to validate
the generated data.
Related papers
- M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding [63.33447665725129]
We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts.
M3DocRAG can efficiently handle single or many documents while preserving visual information.
We also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages.
arXiv Detail & Related papers (2024-11-07T18:29:38Z) - Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - BuDDIE: A Business Document Dataset for Multi-task Information Extraction [18.440587946049845]
BuDDIE is the first multi-task dataset of 1,665 real-world business documents.
Our dataset consists of publicly available business entity documents from US state government websites.
arXiv Detail & Related papers (2024-04-05T10:26:42Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Cross-Modal Entity Matching for Visually Rich Documents [4.8119678510491815]
Visually rich documents utilize visual cues to augment their semantics.
Existing works that enable structured querying on these documents do not take this into account.
We propose Juno -- a cross-modal entity matching framework to address this limitation.
arXiv Detail & Related papers (2023-03-01T18:26:14Z) - Multi-View Document Representation Learning for Open-Domain Dense
Retrieval [87.11836738011007]
This paper proposes a multi-view document representation learning framework.
It aims to produce multi-view embeddings to represent documents and enforce them to align with different queries.
Experiments show our method outperforms recent works and achieves state-of-the-art results.
arXiv Detail & Related papers (2022-03-16T03:36:38Z) - CSFCube -- A Test Collection of Computer Science Research Articles for
Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example.
Users can also specify a finer grained aspect in addition to the input query document.
We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.