Automatic Document Sketching: Generating Drafts from Analogous Texts
- URL: http://arxiv.org/abs/2106.07192v1
- Date: Mon, 14 Jun 2021 06:46:06 GMT
- Title: Automatic Document Sketching: Generating Drafts from Analogous Texts
- Authors: Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Bill Dolan
- Abstract summary: We introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise.
These drafts are built from sets of documents that overlap in form - sharing large segments of potentially reusable text - while diverging in content.
We investigate the application of weakly supervised methods, including use of a transformer-based mixture of experts, together with reinforcement learning.
- Score: 44.626645471195495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of large pre-trained language models has made it possible to make
high-quality predictions on how to add or change a sentence in a document.
However, the high branching factor inherent to text generation impedes the
ability of even the strongest language models to offer useful editing
suggestions at a more global or document level. We introduce a new task,
document sketching, which involves generating entire draft documents for the
writer to review and revise. These drafts are built from sets of documents that
overlap in form - sharing large segments of potentially reusable text - while
diverging in content. To support this task, we introduce a Wikipedia-based
dataset of analogous documents and investigate the application of weakly
supervised methods, including use of a transformer-based mixture of experts,
together with reinforcement learning. We report experiments using automated and
human evaluation methods and discuss relative merits of these models.
Related papers
- Knowledge-Centric Templatic Views of Documents [2.654058995940072]
Authors often share their ideas in various document formats, such as slide decks, newsletters, reports, and posters.
We introduce a novel unified evaluation framework that can be adapted to measuring the quality of document generators.
We conduct a human evaluation, which shows that people prefer 82% of the documents generated with our method.
arXiv Detail & Related papers (2024-01-13T01:22:15Z) - Improving Contextualized Topic Models with Negative Sampling [3.708656266586146]
We propose a negative sampling mechanism for a contextualized topic model to improve the quality of the generated topics.
In particular, during model training, we perturb the generated document-topic vector and use a triplet loss to encourage the document reconstructed from the correct document-topic vector to be similar to the input document.
arXiv Detail & Related papers (2023-03-27T07:28:46Z) - Robust Text Line Detection in Historical Documents: Learning and
Evaluation Methods [1.9938405188113029]
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net.
We show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
arXiv Detail & Related papers (2022-03-23T11:56:25Z) - Multi-Vector Models with Textual Guidance for Fine-Grained Scientific
Document Similarity [11.157086694203201]
We present a new scientific document similarity model based on matching fine-grained aspects.
Our model is trained using co-citation contexts that describe related paper aspects as a novel form of textual supervision.
arXiv Detail & Related papers (2021-11-16T11:12:30Z) - Focused Attention Improves Document-Grounded Generation [111.42360617630669]
Document grounded generation is the task of using the information provided in a document to improve text generation.
This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation.
arXiv Detail & Related papers (2021-04-26T16:56:29Z) - Text Editing by Command [82.50904226312451]
A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step.
We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text.
We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations.
arXiv Detail & Related papers (2020-10-24T08:00:30Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.