What do writing features tell us about AI papers?
- URL: http://arxiv.org/abs/2107.06310v1
- Date: Tue, 13 Jul 2021 18:12:12 GMT
- Title: What do writing features tell us about AI papers?
- Authors: Zining Zhu, Bai Li, Yang Xu, Frank Rudzicz
- Abstract summary: We argue that studying interpretable dimensions of academic papers could lead to scalable solutions.
We extract a collection of writing features, and construct a suite of prediction tasks to assess the usefulness of these features in predicting citation counts and the publication of AI-related papers.
- Score: 23.224038524126467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the numbers of submissions to conferences grow quickly, the task of
assessing the quality of academic papers automatically, convincingly, and with
high accuracy attracts increasing attention. We argue that studying
interpretable dimensions of these submissions could lead to scalable solutions.
We extract a collection of writing features, and construct a suite of
prediction tasks to assess the usefulness of these features in predicting
citation counts and the publication of AI-related papers. Depending on the
venues, the writing features can predict the conference vs. workshop appearance
with F1 scores up to 60-90, sometimes even outperforming the content-based
tf-idf features and RoBERTa. We show that the features describe writing style
more than content. To further understand the results, we estimate the causal
impact of the most indicative features. Our analysis on writing features
provides a perspective to assessing and refining the writing of academic
articles at scale.
Related papers
- BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation.
We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses.
Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z) - Leveraging Contextual Information for Effective Entity Salience Detection [21.30389576465761]
We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches.
We also show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity.
arXiv Detail & Related papers (2023-09-14T19:04:40Z) - Automatic Recognition and Classification of Future Work Sentences from
Academic Articles in a Specific Domain [7.652206854575039]
Future work sentences (FWS) are the sentences in academic papers that contain the author's description of their proposed follow-up research direction.
This paper presents methods to automatically extract FWS from academic papers and classify them according to the different future directions embodied in the paper's content.
arXiv Detail & Related papers (2022-12-28T15:26:04Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Automatic Analysis of Linguistic Features in Journal Articles of
Different Academic Impacts with Feature Engineering Techniques [0.975434908987426]
This study attempts to extract micro-level linguistic features in high- and moderate-impact journal RAs, using feature engineering methods.
We extracted 25 highly relevant features from the Corpus of English Journal Articles through feature selection methods.
Results showed that 24 linguistic features such as the overlapping of content words between adjacent sentences, the use of third-person pronouns, auxiliary verbs, tense, emotional words provide consistent and accurate predictions for journal articles with different academic impacts.
arXiv Detail & Related papers (2021-11-15T03:56:50Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Machine Identification of High Impact Research through Text and Image
Analysis [0.4737991126491218]
We present a system to automatically separate papers with a high from those with a low likelihood of gaining citations.
Our system uses both a visual classifier, useful for surmising a document's overall appearance, and a text classifier, for making content-informed decisions.
arXiv Detail & Related papers (2020-05-20T19:12:24Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.