Related papers: Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

URL: http://arxiv.org/abs/2212.09676v2
Date: Tue, 23 May 2023 02:24:54 GMT
Title: Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Authors: Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith
Abstract summary: We develop and validate an interpretable approach for measuring scholarly jargon from text. We use word sense induction to identify words that are widespread but overloaded with different meanings across fields. We show that word senses provide a complementary, yet unique view of jargon alongside word types.
Score: 13.443742417911352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring scholarly jargon from text. Expanding the scope of prior work which focuses on word types, we use word sense induction to also identify words that are widespread but overloaded with different meanings across fields. We then estimate the prevalence of these discipline-specific words and senses across hundreds of subfields, and show that word senses provide a complementary, yet unique view of jargon alongside word types. We demonstrate the utility of our metrics for science of science and computational sociolinguistics by highlighting two key social implications. First, though most fields reduce their use of jargon when writing for general-purpose venues, and some fields (e.g., biological sciences) do so less than others. Second, the direction of correlation between jargon and citation rates varies among fields, but jargon is nearly always negatively correlated with interdisciplinary impact. Broadly, our findings suggest that though multidisciplinary venues intend to cater to more general audiences, some fields' writing norms may act as barriers rather than bridges, and thus impede the dispersion of scholarly ideas.

Related papers

Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work [13.058644000074654]
We develop a prototype cross-domain search engine that uses aligned domain-specific embeddings to support conceptual exploration. We discuss qualitative insights into the promises and pitfalls of this approach to translation work.
arXiv Detail & Related papers (2025-03-24T09:19:29Z)
Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z)
Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z)
Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models. Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores. We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z)
SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings [4.479834103607384]
Adding interpretability to word embeddings represents an area of active research in text representation. We present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings.
arXiv Detail & Related papers (2023-01-11T20:25:53Z)
Dictionary-Assisted Supervised Contrastive Learning [0.0]
We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications.
arXiv Detail & Related papers (2022-10-27T04:57:43Z)
Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text. Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z)
What's in a Scientific Name? [0.0]
The study reported here takes into account the words "prediction", "model", "optimization", "complex", "entropy", "random", "deterministic", "pattern", and "Database" Several of the words were observed to have markedly distinct associations in different areas. Biology was found to be related to computer science, sharing associations with databases.
arXiv Detail & Related papers (2021-05-31T22:06:20Z)
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger. We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences. We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z)
Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take. We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z)
Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts [0.15833270109954134]
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. We introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences.
arXiv Detail & Related papers (2020-08-05T17:27:11Z)
Cultural Cartography with Word Embeddings [0.0]
We show how word embeddings are commensurate with prevailing theories of meaning in sociology. First, one can hold terms constant and measure how the embedding space moves around them. Second, one can also hold the embedding space constant and see how documents or authors move relative to it.
arXiv Detail & Related papers (2020-07-09T01:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.