Related papers: Decoding the End-to-end Writing Trajectory in Scholarly Manuscripts

Decoding the End-to-end Writing Trajectory in Scholarly Manuscripts

URL: http://arxiv.org/abs/2304.00121v1
Date: Fri, 31 Mar 2023 20:33:03 GMT
Title: Decoding the End-to-end Writing Trajectory in Scholarly Manuscripts
Authors: Ryan Koo, Anna Martin, Linghe Wang, Dongyeop Kang
Abstract summary: We introduce a novel taxonomy that categorizes scholarly writing behaviors according to intention, writer actions, and the information types of the written data. Motivated by cognitive writing theory, our taxonomy for scientific papers includes three levels of categorization in order to trace the general writing flow. ManuScript intends to provide a complete picture of the scholarly writing process by capturing the linearity and non-linearity of writing trajectory.
Score: 7.294418916091011
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scholarly writing presents a complex space that generally follows a methodical procedure to plan and produce both rationally sound and creative compositions. Recent works involving large language models (LLM) demonstrate considerable success in text generation and revision tasks; however, LLMs still struggle to provide structural and creative feedback on the document level that is crucial to academic writing. In this paper, we introduce a novel taxonomy that categorizes scholarly writing behaviors according to intention, writer actions, and the information types of the written data. We also provide ManuScript, an original dataset annotated with a simplified version of our taxonomy to show writer actions and the intentions behind them. Motivated by cognitive writing theory, our taxonomy for scientific papers includes three levels of categorization in order to trace the general writing flow and identify the distinct writer activities embedded within each higher-level process. ManuScript intends to provide a complete picture of the scholarly writing process by capturing the linearity and non-linearity of writing trajectory, such that writing assistants can provide stronger feedback and suggestions on an end-to-end level. The collected writing trajectories are viewed at https://minnesotanlp.github.io/REWARD_demo/

Related papers

Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback [57.200668979963694]
We present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues.<n>We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics.
arXiv Detail & Related papers (2025-07-21T18:56:50Z)
WritingBench: A Comprehensive Benchmark for Generative Writing [87.48445972563631]
We present WritingBench, a benchmark designed to evaluate large language models (LLMs) across 6 core writing domains and 100, encompassing creative, persuasive, informative, and technical writing. We propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria. This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length.
arXiv Detail & Related papers (2025-03-07T08:56:20Z)
ScholaWrite: A Dataset of End-to-End Scholarly Writing Process [12.170448539143909]
ScholaWrite dataset is a first-of-its-kind keystroke corpus of an end-to-end scholarly writing process for complete manuscripts. Our dataset includes-based keystroke data from five preprints with nearly 62K total text changes and annotations across 4 months of paper writing.
arXiv Detail & Related papers (2025-02-05T05:57:37Z)
BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z)
Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents. We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z)
Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels. The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation. Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z)
Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint. We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z)
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models [8.920436030483872]
We propose Writing Path, a framework that uses explicit outlines to guide Large Language Models (LLMs) in generating user-aligned text. Our approach draws inspiration from structured writing planning and reasoning paths, focusing on capturing and reflecting user intentions throughout the writing process.
arXiv Detail & Related papers (2024-04-22T06:57:43Z)
Can Authorship Representation Learning Capture Stylistic Features? [5.812943049068866]
We show that representations learned for a surrogate authorship prediction task are indeed sensitive to writing style. As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time. Our findings may open the door to downstream applications that require stylistic representations, such as style transfer.
arXiv Detail & Related papers (2023-08-22T15:10:45Z)
Exploitation and exploration in text evolution. Quantifying planning and translation flows during writing [0.13108652488669734]
We introduce measures to quantify subcycles of planning (exploration) and translation (exploitation) during the writing process. This dataset comes from a series of writing workshops in which, through innovative versioning software, we were able to record all the steps in the construction of a text.
arXiv Detail & Related papers (2023-02-07T17:52:33Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues. A main challenge is that a person often writes a letter in different styles from time to time. We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.