Decoding the End-to-end Writing Trajectory in Scholarly Manuscripts
- URL: http://arxiv.org/abs/2304.00121v1
- Date: Fri, 31 Mar 2023 20:33:03 GMT
- Title: Decoding the End-to-end Writing Trajectory in Scholarly Manuscripts
- Authors: Ryan Koo, Anna Martin, Linghe Wang, Dongyeop Kang
- Abstract summary: We introduce a novel taxonomy that categorizes scholarly writing behaviors according to intention, writer actions, and the information types of the written data.
Motivated by cognitive writing theory, our taxonomy for scientific papers includes three levels of categorization in order to trace the general writing flow.
ManuScript intends to provide a complete picture of the scholarly writing process by capturing the linearity and non-linearity of writing trajectory.
- Score: 7.294418916091011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scholarly writing presents a complex space that generally follows a
methodical procedure to plan and produce both rationally sound and creative
compositions. Recent works involving large language models (LLM) demonstrate
considerable success in text generation and revision tasks; however, LLMs still
struggle to provide structural and creative feedback on the document level that
is crucial to academic writing. In this paper, we introduce a novel taxonomy
that categorizes scholarly writing behaviors according to intention, writer
actions, and the information types of the written data. We also provide
ManuScript, an original dataset annotated with a simplified version of our
taxonomy to show writer actions and the intentions behind them. Motivated by
cognitive writing theory, our taxonomy for scientific papers includes three
levels of categorization in order to trace the general writing flow and
identify the distinct writer activities embedded within each higher-level
process. ManuScript intends to provide a complete picture of the scholarly
writing process by capturing the linearity and non-linearity of writing
trajectory, such that writing assistants can provide stronger feedback and
suggestions on an end-to-end level. The collected writing trajectories are
viewed at https://minnesotanlp.github.io/REWARD_demo/
Related papers
- BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation.
We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses.
Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z) - Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents.
We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z) - Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.
We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z) - Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models [8.920436030483872]
We propose Writing Path, a framework that uses explicit outlines to guide Large Language Models (LLMs) in generating user-aligned text.
Our approach draws inspiration from structured writing planning and reasoning paths, focusing on capturing and reflecting user intentions throughout the writing process.
arXiv Detail & Related papers (2024-04-22T06:57:43Z) - Can Authorship Representation Learning Capture Stylistic Features? [5.812943049068866]
We show that representations learned for a surrogate authorship prediction task are indeed sensitive to writing style.
As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time.
Our findings may open the door to downstream applications that require stylistic representations, such as style transfer.
arXiv Detail & Related papers (2023-08-22T15:10:45Z) - Exploitation and exploration in text evolution. Quantifying planning and
translation flows during writing [0.13108652488669734]
We introduce measures to quantify subcycles of planning (exploration) and translation (exploitation) during the writing process.
This dataset comes from a series of writing workshops in which, through innovative versioning software, we were able to record all the steps in the construction of a text.
arXiv Detail & Related papers (2023-02-07T17:52:33Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.