Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings
in Scholarly Publications
- URL: http://arxiv.org/abs/2212.09676v2
- Date: Tue, 23 May 2023 02:24:54 GMT
- Title: Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings
in Scholarly Publications
- Authors: Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith
- Abstract summary: We develop and validate an interpretable approach for measuring scholarly jargon from text.
We use word sense induction to identify words that are widespread but overloaded with different meanings across fields.
We show that word senses provide a complementary, yet unique view of jargon alongside word types.
- Score: 13.443742417911352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scholarly text is often laden with jargon, or specialized language that can
facilitate efficient in-group communication within fields but hinder
understanding for out-groups. In this work, we develop and validate an
interpretable approach for measuring scholarly jargon from text. Expanding the
scope of prior work which focuses on word types, we use word sense induction to
also identify words that are widespread but overloaded with different meanings
across fields. We then estimate the prevalence of these discipline-specific
words and senses across hundreds of subfields, and show that word senses
provide a complementary, yet unique view of jargon alongside word types. We
demonstrate the utility of our metrics for science of science and computational
sociolinguistics by highlighting two key social implications. First, though
most fields reduce their use of jargon when writing for general-purpose venues,
and some fields (e.g., biological sciences) do so less than others. Second, the
direction of correlation between jargon and citation rates varies among fields,
but jargon is nearly always negatively correlated with interdisciplinary
impact. Broadly, our findings suggest that though multidisciplinary venues
intend to cater to more general audiences, some fields' writing norms may act
as barriers rather than bridges, and thus impede the dispersion of scholarly
ideas.
Related papers
- Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - SensePOLAR: Word sense aware interpretability for pre-trained contextual
word embeddings [4.479834103607384]
Adding interpretability to word embeddings represents an area of active research in text representation.
We present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings.
arXiv Detail & Related papers (2023-01-11T20:25:53Z) - Dictionary-Assisted Supervised Contrastive Learning [0.0]
We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries.
The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest.
DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications.
arXiv Detail & Related papers (2022-10-27T04:57:43Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - What's in a Scientific Name? [0.0]
The study reported here takes into account the words "prediction", "model", "optimization", "complex", "entropy", "random", "deterministic", "pattern", and "Database"
Several of the words were observed to have markedly distinct associations in different areas. Biology was found to be related to computer science, sharing associations with databases.
arXiv Detail & Related papers (2021-05-31T22:06:20Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Generalized Word Shift Graphs: A Method for Visualizing and Explaining
Pairwise Comparisons Between Texts [0.15833270109954134]
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content.
We introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts.
We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences.
arXiv Detail & Related papers (2020-08-05T17:27:11Z) - Cultural Cartography with Word Embeddings [0.0]
We show how word embeddings are commensurate with prevailing theories of meaning in sociology.
First, one can hold terms constant and measure how the embedding space moves around them.
Second, one can also hold the embedding space constant and see how documents or authors move relative to it.
arXiv Detail & Related papers (2020-07-09T01:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.