One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
- URL: http://arxiv.org/abs/2209.06584v1
- Date: Mon, 12 Sep 2022 19:26:32 GMT
- Title: One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
- Authors: Abhinav Java, Shripad Deshmukh, Milan Aggarwal, Surgan Jandial,
Mausoom Sarkar, Balaji Krishnamurthy
- Abstract summary: We propose MONOMER as a one-shot snippet task to find snippets in target documents.
We conduct experiments showing MONOMER outperforms several baselines from oneshot- template-LM.
We train MONOMER on.
generated data having many visually similar query detection data.
- Score: 12.98328149016239
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Active consumption of digital documents has yielded scope for research in
various applications, including search. Traditionally, searching within a
document has been cast as a text matching problem ignoring the rich layout and
visual cues commonly present in structured documents, forms, etc. To that end,
we ask a mostly unexplored question: "Can we search for other similar snippets
present in a target document page given a single query instance of a document
snippet?". We propose MONOMER to solve this as a one-shot snippet detection
task. MONOMER fuses context from visual, textual, and spatial modalities of
snippets and documents to find query snippet in target documents. We conduct
extensive ablations and experiments showing MONOMER outperforms several
baselines from one-shot object detection (BHRL), template matching, and
document understanding (LayoutLMv3). Due to the scarcity of relevant data for
the task at hand, we train MONOMER on programmatically generated data having
many visually similar query snippets and target document pairs from two
datasets - Flamingo Forms and PubLayNet. We also do a human study to validate
the generated data.
Related papers
- MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents [26.39534684408116]
This work introduces a new benchmark, named as MMDocIR, encompassing two distinct tasks: page-level and layout-level retrieval.
The MMDocIR benchmark comprises a rich dataset featuring expertly annotated labels for 1,685 questions and bootstrapped labels for 173,843 questions.
arXiv Detail & Related papers (2025-01-15T14:30:13Z) - M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding [63.33447665725129]
We introduce M3DocRAG, a novel multi-modal RAG framework that flexibly accommodates various document contexts.
M3DocRAG can efficiently handle single or many documents while preserving visual information.
We also present M3DocVQA, a new benchmark for evaluating open-domain DocVQA over 3,000+ PDF documents with 40,000+ pages.
arXiv Detail & Related papers (2024-11-07T18:29:38Z) - Unified Multimodal Interleaved Document Representation for Retrieval [57.65409208879344]
We propose a method that holistically embeds documents interleaved with multiple modalities.
We merge the representations of segmented passages into one single document representation.
We show that our approach substantially outperforms relevant baselines.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Cross-Modal Entity Matching for Visually Rich Documents [4.8119678510491815]
Visually rich documents utilize visual cues to augment their semantics.
Existing works that enable structured querying on these documents do not take this into account.
We propose Juno -- a cross-modal entity matching framework to address this limitation.
arXiv Detail & Related papers (2023-03-01T18:26:14Z) - Multi-View Document Representation Learning for Open-Domain Dense
Retrieval [87.11836738011007]
This paper proposes a multi-view document representation learning framework.
It aims to produce multi-view embeddings to represent documents and enforce them to align with different queries.
Experiments show our method outperforms recent works and achieves state-of-the-art results.
arXiv Detail & Related papers (2022-03-16T03:36:38Z) - CSFCube -- A Test Collection of Computer Science Research Articles for
Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example.
Users can also specify a finer grained aspect in addition to the input query document.
We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z) - DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis.
Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.