Archive TimeLine Summarization (ATLS): Conceptual Framework for Timeline
Generation over Historical Document Collections
- URL: http://arxiv.org/abs/2301.13479v1
- Date: Tue, 31 Jan 2023 08:58:47 GMT
- Title: Archive TimeLine Summarization (ATLS): Conceptual Framework for Timeline
Generation over Historical Document Collections
- Authors: Nicolas Gutehrl\'e (CRIT), Antoine Doucet (L3I), Adam Jatowt
- Abstract summary: We propose to extend TimeLine Summarization (TLS) methods on archive collections to assist in their studies.
We describe a conceptual framework for an Archive TimeLine Summarization (ATLS) system, which aims to generate informative, readable and interpretable timelines.
- Score: 17.332692582748408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Archive collections are nowadays mostly available through search engines
interfaces, which allow a user to retrieve documents by issuing queries. The
study of these collections may be, however, impaired by some aspects of search
engines, such as the overwhelming number of documents returned or the lack of
contextual knowledge provided. New methods that could work independently or in
combination with search engines are then required to access these collections.
In this position paper, we propose to extend TimeLine Summarization (TLS)
methods on archive collections to assist in their studies. We provide an
overview of existing TLS methods and we describe a conceptual framework for an
Archive TimeLine Summarization (ATLS) system, which aims to generate
informative, readable and interpretable timelines.
Related papers
- Reproducible Hybrid Time-Travel Retrieval in Evolving Corpora [1.9202615342033464]
We present a hybrid retrieval system combining Lucene for fast retrieval with a column-store-based retrieval system maintaining a versioned and time-stamped index.
arXiv Detail & Related papers (2024-11-06T16:57:55Z) - Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG)
We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z) - DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems.
Our benchmark involves the recruitment of human annotators and the generation of synthetic questions.
It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z) - ColPali: Efficient Document Retrieval with Vision Language Models [15.369861972085136]
We introduce Visual Document Retrieval Benchmark ViDoRe, composed of various page-level retrieving tasks spanning multiple domains, languages, and settings.
The inherent shortcomings of modern systems motivate the introduction of a new retrieval model architecture, ColPali.
ColPali largely outperforms modern document retrieval pipelines while being drastically faster and end-to-end trainable.
arXiv Detail & Related papers (2024-06-27T15:45:29Z) - PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval [76.50690734636477]
We propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus.
The retrieval system harnesses both dense text embedding and sparse bag-of-words representations.
arXiv Detail & Related papers (2024-04-29T04:51:30Z) - Understanding Archives: Towards New Research Interfaces Relying on the Semantic Annotation of Documents [0.2302001830524133]
We show how the semantic annotation of the textual content of study corpora of archival documents allow to facilitate their exploitation and valorisation.
First, we present a methodological framework for the construction of new interfaces based on textual semantics, then address the current technological obstacles and their potential solutions.
arXiv Detail & Related papers (2024-03-28T07:55:29Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - IncDSI: Incrementally Updatable Document Retrieval [35.5697863674097]
IncDSI is a method to add documents in real time without retraining the model on the entire dataset.
We formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters.
Our approach is competitive with re-training the model on the whole dataset.
arXiv Detail & Related papers (2023-07-19T07:20:30Z) - Decomposing Complex Queries for Tip-of-the-tongue Retrieval [72.07449449115167]
Complex queries describe content elements (e.g., book characters or events), information beyond the document text.
This retrieval setting, called tip of the tongue (TOT), is especially challenging for models reliant on lexical and semantic overlap between query and document text.
We introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results.
arXiv Detail & Related papers (2023-05-24T11:43:40Z) - ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question
Answering over Archival News Collections [20.07130742712862]
We present ArchivalQA, a large question answering dataset consisting of 1,067,056 question-answer pairs.
We create four subparts of our dataset based on the question difficulty levels and the containment of temporal expressions.
arXiv Detail & Related papers (2021-09-08T05:21:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.