Related papers: Finding Pragmatic Differences Between Disciplines

Finding Pragmatic Differences Between Disciplines

URL: http://arxiv.org/abs/2310.00204v1
Date: Sat, 30 Sep 2023 00:46:14 GMT
Title: Finding Pragmatic Differences Between Disciplines
Authors: Lee Kezar, Jay Pujara
Abstract summary: We learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors. We analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.
Score: 14.587150614245123
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Scholarly documents have a great degree of variation, both in terms of content (semantics) and structure (pragmatics). Prior work in scholarly document understanding emphasizes semantics through document summarization and corpus topic modeling but tends to omit pragmatics such as document organization and flow. Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors (also referred to as "normalization"). Then, we analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. We report within-discipline structural archetypes, variability, and between-discipline comparisons, supporting the hypothesis that scholarly communities, despite their size, diversity, and breadth, share similar avenues for expressing their work. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.

Related papers

DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
Trajectories of Change: Approaches for Tracking Knowledge Evolution [0.0]
We explore local vs. global evolution of knowledge systems through the framework of socio-epistemic networks (SEN) We first use information-theoretic measures based on relative entropy to detect semantic shifts, assess their significance, and identify key driving features. Second, variations in document embedding reveal changes in semantic neighbourhoods, tracking how concentration of similar documents increase, remain stable, or disperse.
arXiv Detail & Related papers (2024-12-31T11:09:37Z)
Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding. We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations. We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z)
Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning [7.086262532457526]
We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature. We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the explosion involved in inferring links across papers.
arXiv Detail & Related papers (2024-09-23T15:20:27Z)
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection. We abstract over arbitrary header paraphrases, and ground each topic to respective document locations. We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z)
ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science [0.0]
Large language models record impressive performance on many natural language processing tasks. Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources. We propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation.
arXiv Detail & Related papers (2023-11-21T02:02:46Z)
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis [3.231170156689185]
Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text.
arXiv Detail & Related papers (2023-08-29T16:58:03Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. We introduce a new Compositional Temporal Grounding task and construct two new dataset splits. We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z)
Domain-Specific Word Embeddings with Structure Prediction [3.057136788672694]
We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy. Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests. As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive.
arXiv Detail & Related papers (2022-10-06T12:45:48Z)
Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science. Existing NLP studies focus on the analysis of individual texts. editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z)
Bilingual Topic Models for Comparable Corpora [9.509416095106491]
We propose a binding mechanism between the distributions of the paired documents. To estimate the similarity of documents that are written in different languages we use cross-lingual word embeddings that are learned with shallow neural networks. We evaluate the proposed binding mechanism by extending two topic models: a bilingual adaptation of LDA that assumes bag-of-words inputs and a model that incorporates part of the text structure in the form of boundaries of semantically coherent segments.
arXiv Detail & Related papers (2021-11-30T10:53:41Z)
Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text. In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.